Meet the Faculty - Yao-Yi Chiang
Tell us about your journey to the University of Minnesota.
I was a research associate professor at the University of Southern California (USC). I worked at the Spatial Science Institute. My research area is spatial artificial intelligence (AI), so we build applied machine learning methods and systems to spatial data in order to understand human-environment interactions from all kinds of data, such as old maps, satellite imagery, onsite sensors, street photos, census data, and map datasets. One of the main reasons I came to the University of Minnesota is because of the Department’s strong spatial computing faculty members - like Shashi Shekhar and Mohamed Mokbel. I felt this was a natural environment for me to continue my work with some great collaborators.
How did you become interested in computer science and your specific field?
I started to work on geographical datasets a long time ago when I was an undergraduate student. My first project was to take vector lines, so geometries of things in space, and write a C++ library in order to perform intersections and unions of all of these geometrics. I started to work on larger datasets to see if we could do this more efficiently and then started to work on new problems. For example, how can we take two datasets and integrate them together. After I graduated, I came to the U.S. for my master’s degree in computer science where I started to look at a variety of problems in spatial data - like computer vision for historical map scans and integration systems for map datasets.
My advisor at USC was Craig Knoblock. We worked on a project with the U.S. Geological Survey. I love maps and they provided us with a lot of interesting maps to use for our research. I worked on computer algorithms to help us understand what is going on with different maps. First, we tracked information with scanned images. Then, you use the information in combination with other datasets to create knowledge and useful connections. I have been working in this field for a long time. Last year, my students and I worked with the USC team together and won the first place prize in the DARPA AI for Critical Mineral Assessment Competition.
Tell us more about your current research!
We are working on spatial data in three lines of work. The first area is spatial-temporal phenomena, or changes in space and time. The machine learning problem we are working on is taking limited collections of samples of this phenomena to be able to predict and forecast. One application is air quality prediction. If you want to understand air quality in a city, you can place sensors around, but these physical sensors usually have limitations around where you can place them and have a limited time to collect data. So, how can you use the information you collect to understand what is going on long term in a very defined spatial scale? Some pollutants have high spatial variation - you move a little bit and the pollutant concentration will change significantly. So we are using data science techniques to fill in the blanks. We are trying to understand the underlying physics that could model the entire spatial-temporal phenomena and use a data-driven approach to supplement the things that we do not know.
The second problem we are working on is trying to understand what is going on on earth using multi-model datasets. Scanned historical maps can tell us about the history of a piece of land, while contemporary projections can tell us about how people moved across space. Technology like Google Maps, satellite imagery, and other open source maps and sensors tell us about many things at interesting locations on earth. These are all different data formats that give the user one particular perspective. We are using machine learning techniques to combine different information together. Now if we see a trajectory, we can understand the moving behavior and the purpose of the move. This will help us simulate large trajectories in a mega city and to detect the anomalies of a group of trajectories. I work in collaboration with Shashi Shekhar on this cool project.
We also recently finished a project with the David Rasmey Map Center at Stanford. We received a donation from their foundation to fund a project that takes historical map scans and uses machine learning techniques to read the maps and generate text labels. We were able to generate over 90 million text labels, so now users can search labels to find historical locations without having to look at every map. Our results are used in the David Rumsey Map Collection to support thousands of daily users.
The third research pipeline is spatial text in natural language documents. We are trying to understand why people call certain things one name locally, but that might have a different meaning somewhere else. For example, in Minnesota, people say you can “park your car on the ramp”. This was confusing for me because in California, a “ramp” is a highway entrance or exit. So how can we enable a machine learning model, like ChatGPT, to understand this type of nuance between locations. We are working with USC to train a large language model with geographical datasets so the model can identify a semantic type in a point of interest.
What courses are you currently teaching?
I am currently teaching data mining. It is a graduate level course, but senior undergraduates can also take it. In lecture, we talk about the foundations of algorithms so that students can understand how each algorithm is designed. The assignments are very practical. We give the students large datasets and they have to be able to use Spark, a big data platform, to build systems, identify interesting patterns, and make real-world impact. I have been teaching this course for a long time at USC and this is my first time teaching it at the U of M.
I am also teaching a seminar course on spatial AI. We cover the machine learning techniques and spatial data basics in order for the student to use it on their own to solve real problems.
What can students expect to get out of your courses?
For the data mining course, students will learn how to handle large datasets and find interesting patterns. They will take a large dataset problem and learn how to find an algorithm and model and use that to solve the problem while writing code to find interesting patterns.
For the spatial AI course, students will learn how to handle different types of spatial data, and select and develop the appropriate machine learning tools to analyze that data to solve a real problem.
What do you enjoy most about teaching?
It’s very interesting to provide students with a different perspective on algorithms. We read textbooks that describe the algorithms, but it does not tell you how and why an algorithm is designed a certain way. Many algorithms are about optimization towards a specific goal. While it may seem like a challenging problem, as you add more tools to your toolbox, you will be able to solve any problem someday. My classes are foundational to understand how these algorithms work. When students come to me and say that the assignments are tough, but are very useful when interviewing for an internship or full-time position, that makes me really happy.
What do you do outside of the classroom for fun? Favorite spot in the Twin Cities?
I am learning to speak Korean. We have a lot of Korean students and it also helps me understand how different cultures understand language and use it to talk about things in different ways. In Minnesota, there are a lot of interesting things to do outside. I like hiking and going to different lakes. I really like the winter here. I like shoveling snow. It helps me concentrate and think. Minneapolis has the right amount of winter. Coming from LA, every day felt the same so I like the season changes.
I like hanging out in the North Loop. There are a lot of interesting buildings and I like walking around there and trying different restaurants.