Two CS&E Faculty Receive Amazon Gift Funds Totaling $300K
Department of Computer Science & Engineering Associate Professor Caiwen Ding and Assistant Professor Zirui Liu each received $150K gift funds from Amazon to support their research. Both projects advance methods for improving the efficiency of artificial intelligence (AI) systems.
Ding’s project, titled “KernelForge: An Agentic AI Framework with Hierarchical Hardware Feedback for AI Accelerator Optimization,” aims to make it easier and faster to create high-performance Graphic Processing Unit (GPU) kernels.
Kernels are the specialized programs that instruct hardware on how to perform specific computations. Currently, writing these kernels by hand is slow, difficult, and becomes outdated quickly with rapid algorithm changes, Ding explained. His research aims to address this by developing an AI system that automatically generates and optimizes kernels for various types of AI accelerators.
The KernelForge project builds off of CUDAForge (Ding’s previous work). CUDAForge is a training-free, multi-agent system that combines two agents – one that creates the kernel and the other that checks performance and guides optimization. It performed faster than other methods while achieving high accuracy and low computational cost.
KernelForge expands Ding's previous research by including a broader range of AI accelerators. It will allow the system to generate optimized kernels across many different hardware platforms, not just Compute Unified Device Architecture (CUDA)-based GPUs.
Liu's project is titled “Low-precision Reinforcement Learning System via Algorithm-Kernel Co-design.” His research looks to bring low-precision computing to reinforcement learning training. Low-precision computing turns high-precision numbers to lower-precision numbers, increasing computational speed. Reinforcement learning is the training of AI through trial and error, an important part of training large language models (LLMs).
While low-precision computing is faster, it can produce precision mismatch when used for reinforcement learning. Precision mismatch is when the training engine and the rollout engine produce different probability values for the same model, causing the reinforcement learning algorithm to think it is training on the wrong data and the model to fail.
Current solutions to precision mismatch do not focus on the root of the problem, but rather on the algorithm level. Liu’s team looks at both the algorithm level and the system level. On the systems side, they have developed Tree-Based Invariant Kernels (TBIK) that ensure the training and rollout engines provide bitwise-identical true on-policy reinforcement learning. On the algorithmic side, the project introduces a method to scale and round low-precision numbers to maintain accuracy during training. Part of this research was selected for an oral presentation at NeurIPS 2025
Liu’s research hopes to make reinforcement learning training more accessible and sustainable. By solving precision mismatch at its root, researchers no longer need to fall back to expensive FP32 computation as a workaround.
Learn more about Ding’s research and Liu’s research on their personal websites.