Spring 2025 Data Science Poster Fair - Poster Details

Every year, data science M.S. students present their capstone projects during this event as a part of their degree requirements. 

The poster fair is open to the public and all interested undergraduate and graduate students, alumni, staff, faculty, and industry professionals are encouraged to attend.

Session 1 Presenters: 10-11am

Expand all

Jacynda Alatoma

Applying Machine Learning Methods to the Cryogenic Dark Matter Search (CMDS) Project

Advisor: Michael Steinbach, Department of Computer Science & Engineering

Abstract: The SuperCDMS experiment employs cryogenic solid-state detectors to search for dark matter interactions. A key challenge in this effort is accurately reconstructing the locations of these interactions within the detectors. This study applies Machine Learning (ML) techniques to increase an understanding of the particles by predicting interaction locations based on detector signals. By framing the problem as a regression task, the approach allows for greater flexibility in identifying novel or unexpected event distributions. This work contributes to the FAIR4HEP initiative by advancing open and reproducible AI frameworks in high-energy physics. This study is being developed to test the limits of ML techniques on more complex interaction patterns, fostering broader applications of AI in fundamental high energy physics research.

Mihir Ashok Momaya

Method to Forecast Kidney Function Overtime

Advisor: Saumya Sinha, Department of Industrial and Systems Engineering

Abstract: Delayed graft function (DGF) is a significant post-transplant complication that affects kidney transplant recipients, often leading to increased morbidity and potential graft loss. This study aims to identify patients with DGF using predefined clinical indicators within seven days post-transplant and assess its long-term impact. Patients are categorized into three groups: those who experience graft loss within one year, after one year, and those without DGF who still experience graft loss.

This is a retrospective study design using longitudinal de-identified data from electronic health records (EHR) available through the University of Minnesota Clinical and Translational Science Institute (CTSI) comprising over 200 million rows across multiple tables, was refined using advanced data science techniques to ensure quality and usability. This involved preprocessing, normalization, handling missing values, structuring data for machine learning applications and establishing clean data pipelines for future predictive modeling. Key risk factors, including urinalysis results, dialysis history, and biochemical markers, were analyzed to distinguish DGF patients. Statistical and machine learning models were employed to characterize differences between those with DGF-related graft loss and those who retained graft function.
Preliminary findings suggest that DGF patients with early graft loss exhibit distinct clinical profiles compared to those who recover. Understanding these differences can improve early detection, optimize post-transplant care, and potentially enhance graft survival outcomes. Future work involves refining predictive models and validating results across larger cohorts to improve patient stratification and treatment planning.

Ritwick Banerjee

Mood Maps: Causal effects of moods - Depression, Mania and more

Advisor: Erich Kummerfeld, Institute for Health Informatics

Abstract: Bipolar disorder (BD) is a chronic mental health condition characterized by extreme mood fluctuations, encompassing periods of emotional highs (mania) and lows (depression). Investigating how manic and depressive symptoms causally influence one another could generate a symptom hierarchy with implications for refining the current conceptualizations of mood episodes, identifying treatment interventions, and providing insights into the biological underpinnings of the disorder. To analyze symptom relationships in the BD symptom network, we employed Causal Discovery Analysis (CDA) to identify causal relationships among manic and depressive symptoms in the 4 canonical mood states of BD: mania, depression, mixed state, and euthymia. Our analysis utilized data from ten NIMH-funded studies (N=6021 participants, 17,044 mood state observations), which assessed symptoms using the Young Mania Rating Scale and the Montgomery-Asberg Depression Rating Scale. We examined the causal structure of the symptom network in each of the 4 mood states, including whether symptoms with the greatest influence varied across mood states. Our findings indicated that mania, depression, mixed states, and euthymia exhibited distinct causal structures. The mood and energy symptoms characteristic of each mood episode exerted the strongest influence on the respective symptom networks. Interestingly, symptoms that were mainly effects in mood episodes were mainly causes in euthymia. We conclude that mania, depression, and mixed states are not only marked by heightened symptom severity but also by a reconfiguration of the causal structure of the symptom network. 

Sandeep Bhuiya

AI-Powered Resume Optimization & Job Recommendation System

Advisor: Nathanial Helwig, Department of Psychology, School of Statistics

Abstract: The study that is presented here showcases an AI powered resume optimization & job recommendation system which has been designed to improve the process of job seeking and improving the outcome with the help of machine learning. The system in play here uses NLP (natural language processing) and a ML(Machine learning) technique which is based on similarity matching and also leverages Generative AI to enable the job seekers with more customized or more tailored recommendations for their jobs and also optimizes the application materials needed for applying. The process consists of analyzing a candidate’s resume and it identifies the most apt job for the candidate with the help of TF-IDF vectorization and cosine similarity scoring pattern. Following on with that, the model here optimizes the resume by further aligning the content with the job description which is given with the help of GPT-4-powered enhancements, which helps in ensuring that there is an improved applicant tracking system (ATS) compatibility and a good stage to allow employer relevance. The system further ensures that we generate personalized cover letters which aids the job seekers to craft a compelling narrative which is tailored to the specific job description which we enter. Going forward, we have a stable diffusion powered model which helps the resume template generate and enhance some of the visual appearance and enhance the applications by giving professional templates. The project supports metal GPUs (MPS) acceleration which enables high performance execution on apple silicon devices. This project aims to research and improve the job search efficiency with the process of integrating automation, AI driven recommendations and resume enhancement eventually bridging the gap between job seekers and certain employees who are looking for data-driven career matching. The system here has the ability to streamline the job application process, further enhancing the resume visibility and thereby increase the hiring success rate for candidates.

Mudit Jantwal

Network Analysis of Psychophysiological and Symptom Interactions

Advisor: Justin Anker, Department of Psychiatry and Behavioral Sciences

Abstract: Recent advancements in network analysis offer a transformative approach to understanding the complex interdependencies inherent in psychopathology. This project proposes a network analysis to explore the intricate relationships among anxiety symptoms, stress biomarkers, and alcohol use disorder outcomes in individuals with comorbid anxiety and alcohol use disorder. Traditional linear models often overlook the bidirectional influences among clinical and physiological variables, prompting the need for a systems-level perspective. Using partial correlation methods, including the EBIC Glasso technique, we will construct a comprehensive network comprising key symptom indicators (e.g., from STAI and BDI assessments) alongside physiological metrics such as cortisol levels, heart rate variability, and startle response. Centrality measures—strength, closeness, and betweenness will be employed to identify critical "hub" nodes that may disproportionately affect relapse risk and overall symptom severity. Additionally, comparative analyses will examine differences in network structure between subgroups (AUD vs. AUD with comorbid anxiety; relapsed vs. abstinent individuals). Expected deliverables include visual network diagrams and detailed centrality tables, which together aim to uncover novel insights into the mechanisms driving comorbidity and inform targeted interventions. This project strives to advance the understanding of complex clinical interactions and contribute to the development of more effective treatment strategies.

Mitch Kosieradzki

TrajFlow: A Generative Framework for Occupancy Density Estimation Using Normalizing Flows

Advisor: Seongjin Choi, Department of Civil, Environmental, and Geo-Engineering

Abstract: In social robotic systems such as autonomous vehicles, understanding the future motion of surrounding agents is crucial for effective motion planning. However, motion is inherently uncertain, making a curate prediction challenging. In this work, we propose TrajFlow, a generative framework for estimating the occupancy density of dynamic agents. It leverages a causal encoder to extract semantically meaningful trajectory embeddings and a normalizing flow to predict future occupancy densities. Unlike existing approaches, TrajFlow models the marginal distribution of spatial locations rather than the joint distribution of unobserved trajectories. This formulation improves accuracy on trajectory forecasting benchmarks, enables fully continuous sampling, and supports both motion trajectories and occupancy grids—two key representations in motion forecasting. To implement this framework, we introduce a novel architecture based entirely on neural differential equations and provide ablations demonstrating the benefits of a continuous approach over discrete neural networks.

Sri Krishna Vamsi Koneru

Impact of personal nursing on child health status

Advisor: Erich Kummerfeld, Institute for Health Informatics

Abstract: Access to personalized healthcare is important for a child’s overall well-being and long-term health outcomes. This project investigates the causal impact of having a personal healthcare professional on a child's health status using data from the 2023 National Survey of Children's Health (NSCH). Given the observational nature of the dataset, causal inference methods are employed to estimate the Average Treatment Effect (ATE). Data preprocessing includes imputation to address missing values and ensure robustness. I implement Regression Adjustment, Propensity Score Regression, and Propensity Score Stratification techniques to estimate the causal relationship between a child's health status and the presence of a personal healthcare professional accounting for relevant socio demographic factors.

Akshara Madhu Suthanan

LLM Based Chatbot for Administrative Use at Carlson

Advisor: Ravi Bapna, Information & Decision Sciences 

Abstract: TBD

Shesha Sai Kumar Reddy Sadu

Density based Colocation Mining

Advisor: Shashi Shekhar, Department of Computer Science & Engineering

Abstract: TBD 

Marcus Spotanski

Video Generation Improvements for Autonomous Vehicles

Advisor: Zhi-LI Zhang, Department of Computer Science & Engineering

Abstract: Autonomous vehicles are a rapidly evolving field, however, there are significant challenges in training and developing the AI models needed for self-driving vehicles, including the collection of driving data, limited availability of existing open-source models, and the computational and operational requirements. A model must respond swiftly to unpredictable scenarios (ex., collisions, road obstructions), but collecting this kind of real-world data is both dangerous and impractical. To address this, researchers are implementing generative AI driving models that can output short videos or images similar to real driving scenarios. However, most autonomous driving models and features are restricted in their availability for research. 

Working alongside Dr. Zhang’s Computer Networking lab here at the University of Minnesota Twin-Cities, my capstone project looks to take steps to address these issues by leveraging MagicDriveDiT, a large generative driving model, to develop proof-of-concept artificially generated video clips and images. We aim to measure the latency and physical requirements needed for developing these objects while also conducting data preprocessing on lab-owned driving data so that it may be used for fine-tuning the model so that down the line it may be employed to aide in further development of an open-source autonomous driving model. 

Zheling Yuan

Developing an AI Model for Detecting Prostate Lesions Using Multiparametric MRI (mpMRI)

Advisor: Mingquan Lin, Division of Computational Health Sciences

Abstract: Prostate cancer is one of the most common cancers diagnosed in men, with 1 in 8 men receiving a diagnosis during their lifetime. Multiparametric MRI (mpMRI) and biopsy are the gold standards for diagnosing prostate cancer. While mpMRI helps identify suspicious areas and guides targeted biopsies for tissue analysis, radiologists often face challenges in accurately identifying these areas, resulting in missed diagnoses or unnecessary biopsies. In this project, we propose using 3D Convolutional Neural Network (CNN) deep learning models to classify suspicious areas and enhance diagnostic accuracy. T2-weighted mpMRI images are cropped based on the shape of biopsy needles to simulate real-life criteria and are used to train the model. The performance of the model is evaluated and compared to the Prostate Imaging Reporting and Data System (PI-RADS), based on radiologists' assessments and accuracy relative to the ground truth. 

Session 2 Presenters: 11am - 12pm

Expand all

Chinmay Arora

Geographical SDOH and Diabetes outcomes

Advisor: Erich Kummerfeld, Institute for Health Informatics

Abstract: Diabetes remains a significant public health challenge, with social determinants of health (SDoH) playing a crucial role in disease prevalence and outcomes. This study aims to analyze the relationship between neighborhood deprivation and diabetes prevalence in Minnesota by integrating the Area Deprivation Index (ADI 2020), Social Vulnerability Index (SVI 2020), and patient data from MHealth Fairview (2016–2020). Specifically, the research will evaluate the impact of neighborhood deprivation on diabetes, assess the role of Federally Qualified Health Networks (FQHNs) in supporting vulnerable populations, and identify spatial and demographic disparities in health outcomes.

To achieve these objectives, the study will map diabetes prevalence alongside ADI and SVI indicators to visualize geographic hotspots. Correlations between deprivation indices and health outcomes will be calculated using statistical techniques such as multivariate regression, while machine learning methods will be employed for feature selection and predictive modeling. Geospatial variables will be incorporated to enhance the accuracy of the analysis. Key research questions include understanding how ADI and SVI relate to diabetes prevalence and control, identifying spatial patterns of diabetes outcomes in vulnerable populations, and determining whether insights from deprivation indices can guide targeted interventions.

Expected outcomes include the identification of high-risk geographic areas in Minnesota with elevated diabetes prevalence and poor disease control, as well as quantitative relationships between neighborhood deprivation and health outcomes. The findings will provide evidence to support targeted interventions in high-deprivation communities. By partnering with FQHNs and vulnerable communities, the study seeks to validate results and prioritize actionable insights. While study participants may not receive direct benefits, the knowledge gained will contribute to broader public health efforts aimed at reducing diabetes-related health disparities.

Ke-Chin Chen

Machine learning approaches to predict genetic interactions in human cells

Advisor: Chad Myers, Department of Computer Science & Engineering

Abstract: Synthetic lethality (SL) occurs when mutations in two genes lead to cell death, whereas a mutation in only one of the genes allows the cell to survive. The goal of this project is to develop a machine learning model to predict SL gene pairs in human cells, which can facilitate the identification of new cancer drug targets. 
To build this predictive model, we used a dataset from the human haploid cell line, HAP1, with quantitative measurements of double mutant effects for approximately 4 million unique gene pairs, where ~7,000 pairs were identified as SL pairs. Pairwise features were derived from the Cancer Dependency Map (DepMap), which includes various genomic datasets from hundreds of diverse cancer cell lines such as gene knockout effects, gene expression, and mutation data. Using these features, we trained and tested an XGBoost model on the HAP1 genetic-interaction dataset. The model achieved an AUROC of 0.80, which is close to the AUROC from control predictions based on biological replicate screens. Additionally, our method exhibited generalizability in predicting SL pairs across various cancer cell contexts using external datasets. This work provides a more efficient approach to designing future genetic interaction experiments and enhances our understanding of synthetic lethality in human cells.

Weiwen Chen

Selective Finetuning via Excess Loss for Enhanced Reasoning in Large Language Models

Advisor: Dongyeop Kang, Department of Computer Science & Engineering

Abstract: Large language models (LLMs) excel in numerous natural language processing tasks, yet their ability to perform complex reasoning remains suboptimal under standard finetuning approaches. Conventional approaches apply uniform training across all input data, potentially neglecting high-value tokens while incorporating noise. To address this limitation, we propose a selective training framework that enhances LLM reasoning by focusing finetuning on the most valuable regions of the input. Our method leverages excess loss - the difference between the training model’s loss and that of a reference model - to identify areas with the highest learning potential for the model. We introduce two strategies that operate at different levels of granularity: token-level selection, which isolates high-excess-loss tokens for targeted training, and segment-level selection, which aggregates tokens into meaningful reasoning steps and selects entire segments based on their cumulative excess loss. We evaluate our method on mathematical reasoning benchmarks such as GSM8K and MATH500, demonstrating significant improvements in reasoning performance compared to conventional finetuning.

Matthew Choquette

Machine Learning Based Early Solar Flare Detection for the FOXSI-5 Flare Campaign

Advisor: Jarvis Haupt, Department of Electrical and Computer Engineering

Abstract: Solar flares are among the most violent and sudden eruptions of energy in the solar system, capable of causing stunning aurorae on Earth, yet many of the underlying physical mechanisms remain a mystery. In 2024, the fourth flight of the Focusing Optics X-ray Solar Imager (FOXSI) became the first sounding rocket mission to observe a dedicated solar flare as part of a flare campaign. FOXSI-5 seeks to duplicate this success, but with a launch trigger which enhances ‘nowcasting’ capabilities to predict a solar flare’s future magnitude of x-ray emission mere minutes after it begins. This work examines the performance of several different machine learning model types to forecast near-term x-ray flux to guide a confidently targeted launch of FOXSI-5 on route to observing an active solar flare.

Jithendra Jagannatha Kagathi

Indic LLMs

Advisor: Serguei Pakhomov, Department of Pharmaceutical Care & Health Systems

Abstract: This project focuses on fine-tuning large language models for Sanskrit, an underrepresented Indic language. I address challenges like limited data, complex morphology, and tokenization inefficiencies. Synthetic datasets and tailored prompting strategies are used for translation and question-answering tasks.

Ben Kosieradzki

A Case Study for Implementing AI Modeling for Formulation Properties in the Chemical Industry

Advisor: Sapna Sarupria, Department of Chemistry

Abstract: This case study examines the application of artificial intelligence (AI) modeling to predict viscosity in complex chemical formulations—a critical property influencing product performance and processability. We implemented machine learning techniques to model the nonlinear relationships between raw material inputs, formulation parameters, and resulting viscosity outcomes. A major technical challenge was the inherent incompleteness of available industrial datasets, often marked by missing values, inconsistent data granularity, and limited historical standardization. Despite these limitations, the AI models demonstrated strong predictive capability and revealed underlying trends that were not easily discernible through conventional statistical methods. This work highlights the potential of AI-driven approaches to enhance formulation design and process optimization in the chemical industry, even when working with imperfect data.

Nithya Murikinati

Analyzing and Visualizing Road Network Changes in OpenStreetMap

Advisor: Mohamed Mokbel, Department of Computer Science & Engineering

Abstract: OpenStreetMap (OSM) is a dynamic, crowd-sourced mapping platform where road networks continuously evolve due to user contributions. Understanding these changes over time is crucial for urban planning, disaster response, and transportation analysis. This project focuses on tracking and visualizing modifications in road networks over a selected period by processing daily OSM change data. By analyzing mapping activity trends, infrastructure developments, and geographic patterns of edits, the project aims to generate meaningful visual representations that highlight the evolution of road networks. These insights can support researchers, policymakers, and mapping communities in studying the impact of road network transformations over time.

Walter Sands

VR Art and Data Visualization

Advisor: Daniel Keefe, Department of Computer Science & Engineering

Abstract: This project explores the application of sketch-based interfaces in Virtual Reality (VR) for immersive, artistic 3D data visualization. It addresses the challenge of making scientific data visualization tools more accessible to artists and illustrators, who possess a strong visual design sense but may lack the technical expertise in programming or mathematics required for conventional data visualization software. By integrating VR with sketch-based input methods, this interface enables illustrators to intuitively explore and illustrate 3D vector fields from fluid dynamics simulations in a fully immersive environment. The system enables users to interact with data directly through hand-drawn marks and gestures, which are interpreted relative to the underlying data. Using an algorithm that "settles" the handdrawn marks to fit the underlying vector data ensuring that artistic input remains consistent with the physical properties of the data. The results demonstrate how artistic VR visualization can facilitate deeper understanding and communication of complex 3D flow phenomena represented in multivariate volumetric datasets.

Surabhi Sunil

Detection of Algospeak

Advisor: Stevie Chancellor, Department of Computer Science & Engineering

Abstract: TBD

Arjun Thonoor

Factors Shaping Academic Success: An In-Depth Analysis of Educational Outcomes

Advisor: Erich Kummerfeld, Institute for Health Informatics

Abstract: This capstone project aims to analyze student data from Hopkins High School to uncover academic trends and provide data-driven recommendations for targeted interventions. By conducting extensive exploratory data analysis and leveraging machine learning and deep learning algorithms, the study focuses on predicting future course enrollment patterns. A key area of investigation involves analyzing student course enrollment histories and academic performance to identify students who may benefit from early academic support. Preliminary findings highlight a distinct academic trajectory divergence between advanced and regular track students, originating as early as elementary school. These insights have the potential to inform more equitable and timely interventions, supporting students’ academic progression throughout their educational journey.

Shreya Yashodhar

Optimizing cancer therapy regimens using Reinforcement Learning

Advisor: Kevin Leder, Department of Industrial and Systems Engineering

Abstract: Prostate cancer treatment optimization faces a critical challenge: designing dosing schedules that suppress tumors while delaying therapy-induced resistance. This project introduces a reinforcement learning (RL) framework that integrates mechanistic models of tumor growth with clinical trial data to optimize adaptive dosing regimens. By simulating cancer cell dynamics through traditional growth models calibrated to biomarkers using data, the system captures patient-specific and population-wide evolution patterns. The RL agent processes multi-dimensional inputs to recommend personalized drug dosages and duration adjustments. Multi-objective rewards balance the evolutionary competition between resistant and responsive cells, while accommodating combination therapies and intermittent dosing. Previous related works have highlighted the potential of these models to serve as a decision support-tool for adaptive prostate cancer therapy.