Deep Learning meets PDE Workshop: Talk titles and abstracts

Harbir Antil

Digital Twins, Generative AI, and Beyond: A PDE-Constrained Optimization Perspective

Digital Twins (DTs) are adaptive, real-time virtual replicas of physical systems that integrate physics-based models, sensor data, and intelligent decision-making. At their core, DTs can be rigorously framed within PDE–constrained optimization (PDECO). This talk develops a unified PDECO framework for state estimation and control, leveraging adjoint-based methods in both deterministic and stochastic settings.

To address the challenges of infinite-dimensional, large-scale optimization, we introduce novel function-space trust-region and augmented Lagrangian algorithms, and explore the role of randomized methods in dynamic PDECO.

A central theme is a new connection between PDECO and Generative AI: score-based generative models can be interpreted as backward-in-time PDEs, linking ill-posed inverse problems, stability analysis, and modern machine learning. This perspective bridges physics-informed modeling with data-driven synthesis, opening the door to, for instance, score-based Digital Twins.

Applications span a wide range of domains, including structural and biomedical systems—from bridges and dams to aneurysm modeling, optimal insulation, electromagnetic cloaking, light bending, fusion, and neuromorphic computing. Together, these examples highlight a pathway toward predictive, adaptive, and trustworthy Digital Twins and AI technologies.

Wei Cai

Overcoming the Curse of Dimensionality in the Era of Deep Learning Computing

Deep learning is a transformative mathematical technique with broad impact across many areas of scientific and engineering research. In this talk, we will present some recent results of efficient deep learning algorithms for addressing the curse of dimensionality (CoD) encountered in a wide range of computational engineering and science problems. We focus on three problems: [1] Solving stochastic optimal controls arising from robotic systems, production management, financial portfolio optimization and risk controls. We developed a martingale neural network, based on Varadhan’s martingale formulation of PDEs, to solve the Hamilton-Jacobi-Bellman equation in dynamic programming with a dimensionality up to 10000. [2] Sampling high dimensional transient distribution governed by Fokker-Planck equations (FPE) for many particle interacting systems from biology and statistical physics. A deep neural pushfoward map is learnt to generate the target samples through adversarial training of an ultra-weak form of the FPE. [3] Learning the infinite dimensional operator, mapping medium property to scattering wave field solution, by a multiscale Fourier neural operator (MscaleFNO), which is designed to overcome the spectral bias inherent in standard deep learning architecture and can be used as a surrogate model for high frequency inverse medium problems in medical imaging and geophysical explorations.

Eric Cyr

Towards multilevel training algorithms: Applying scientific computing perspectives to neural networks

The recent explosion of large language models (LLMs) in the commercial space has created unprecedented energy demands driven by the growth of data centers. One aspect of this is the need to train large-scale neural network models on massive amounts of data. Recent work has demonstrated that pre-training a LLM on the Frontier supercomputer would require two years with ideal parallelism. Yet despite advances in optimizers, the training algorithms remain largely the same even as the neural networks scale to trillions of parameters and suffer from quadratic scaling in the number parameters. This motivates our aspirational hypothesis that multilevel (or hierarchical) methodologies can dramatically accelerate training algorithms.

This talk proceeds in three parts. In the first part, we discuss an adaptive basis perspective that has proved fruitful in Scientific Machine Learning (SciML). With this perspective we develop efficient “operator-split” training algorithms, and new initialization strategies motivated by stability concerns. The second part of the talk considers the impact of using quasi-second-order methods to address the “grokking” phenomenon. Through the lens of the spectral bias, we show how Levenberg–Marquardt reduces generalization gap in our experiments and allow training to proceed quickly through the lazy learning regime towards the rich one. The third part presents a view of Kolmogorov-Arnold networks (KANs) that reformulate the activation function as a spline that can be naturally adapted. We explore how this approach can be related to a multi-channel ReLU network and present a multilevel training algorithm based on the relaxation properties of gradient descent using KANs.

Holistically, the goal of this talk is to show how we applied techniques developed for scientific computing to understand and improve neural network training.

Xue Feng

Learn to Evolve: self-supervised Neural JKO Operator for Wasserstein Gradient Flow

The Jordan-Kinderlehrer-Otto (JKO) scheme provides a stable variational framework for computing Wasserstein gradient flows, but its practical use is often limited by the high computational cost of repeatedly solving the JKO subproblems. We propose a self-supervised approach for learning a JKO solution operator without requiring numerical solutions of any JKO trajectories. The learned operator maps an input density directly to the minimizer of the corresponding JKO subproblem, and can be iteratively applied to efficiently generate the gradient-flow evolution. A key challenge is that only a number of initial densities are typically available for training. To address this, we introduce a Learn-to-Evolve algorithm that jointly learns the JKO operator and its induced trajectories by alternating between trajectory generation and operator updates. As training progresses, the generated data increasingly approximates true JKO trajectories. Meanwhile, this Learn-to-Evolve strategy serves as a natural form of data augmentation, significantly enhancing the generalization ability of the learned operator. Numerical experiments demonstrate the accuracy, stability, and robustness of the proposed method across various choices of energies and initial conditions.

Rongjie Lai

SCOPE: Self-supervised In-Context Operator Learning on Probability Measure Space

Many fundamental problems on probability measure spaces, such as optimal transport, mean field games, and Wasserstein gradient flows, are computationally demanding, while existing learning-based methods often rely on single instance solvers and require costly retraining for each new instance. In context learning with transformer models offers a new paradigm for approximating families of operators from a few context examples without task specific retraining. I will discuss our recent work on an unsupervised in context operator learning framework for learning solution operators on probability measure spaces. The method is discretization free, making it effective for high dimensional measure transport problems, and does not require supervised solution labels, substantially reducing the cost of data generation. I will also discuss a generalization error analysis of the proposed transformer-based model, connecting it to emerging theory on in context learning and highlighting broader theoretical implications.

Chun Liu

Energetic Variational Approaches: Diffusion and Beyond

In this talk I will discuss various topics related to generalized diffusion, including coupling and competition with other effects. We will employ the general framework of energetic variational approaches, especially Onsager's Maximum Dissipation Principles to these systems. We will also discuss various analytical issues arising from these studies.

Fei Lu

Learning from unlabeled data for interacting particle systems

Learning the dynamics of complex high-dimensional interacting particle systems is a fundamental task across various scientific disciplines. A major challenge is that unlabeled data collected at discrete time points lack trajectory information due to limitations in data collection methods or privacy constraints. We address this challenge by developing a trajectory-free loss function that leverages the weak-form stochastic evolution equation for the empirical distribution based on the Ito formula. The loss function is quadratic in both the external and interaction potentials, yielding parametric and nonparametric regression algorithms for robust estimation that scale to large, high-dimensional systems with big data. Systematic numerical tests show that our method outperforms baseline methods that regress on trajectories recovered via label matching, tolerating large observation time steps. We establish the convergence of parametric estimators as the sample size increases, providing a performance guarantee and theoretical foundation for the proposed approach.

Mitch Luskin

Deep learning for solving a many-body Schrodinger equation

The Hilbert space for the electronic wave function grows exponentially in the number of electrons N. Self-attention neural networks have been developed that have achieved good approximations of the ground state (smallest eigenvalue of the many-body Schrodinger equation) with N ^2 variational parameters for a 2D Coulomb gas in a periodic potential. I will introduce self-attentional neural networks for the many-body Schrodinger equation and present recent results for the approximation of a 2D Coulomb gas in the periodic moire potential of a relaxed 2D heterostructure.

Joint work with Ziyan Zhu, Max Geier, and Liang Fu.

Levon Nurbekyan

Efficient Natural Gradient Descent Methods for Large-Scale PDE-Based Optimization Problems

We propose efficient numerical schemes for implementing the natural gradient descent (NGD) for a broad range of metric spaces with applications to PDE-based optimization problems. Our technique represents the natural gradient direction as a solution to a standard least-squares problem. Hence, instead of calculating, storing, or inverting the information matrix directly, we apply efficient methods from numerical linear algebra. We treat both scenarios where the Jacobian, i.e., the derivative of the state variable with respect to the parameter, is either explicitly known or implicitly given through constraints. We can thus reliably compute several natural NGDs for a large-scale parameter space. In particular, we are able to compute Wasserstein NGD in thousands of dimensions, which was believed to be out of reach. Finally, our numerical results shed light on the qualitative differences between the standard gradient descent and various NGD methods based on different metric spaces in nonconvex optimization problems.

Lorenzo Pareschi

Structure-Preserving Neural Surrogates for Uncertainty Quantification in Plasma Physics

Reliable uncertainty quantification is a central challenge in plasma simulation, especially in kinetic regimes where predictive computations remain extremely expensive. This issue is particularly relevant in fusion-oriented plasma modeling, where multiscale effects, high dimensionality, and sensitivity to uncertain inputs make brute-force sampling unaffordable. In this talk I will present a multifidelity framework for the Vlasov-Poisson-Landau equation that combines asymptotic-preserving solvers, reduced plasma models, and tensor neural surrogates based on a micro-macro decomposition. The resulting approach produces inexpensive low-fidelity samples that remain strongly correlated with the high-fidelity kinetic model, leading to substantial variance reduction and computational savings. More broadly, this provides an example of how machine learning can become genuinely effective for PDEs when it is built around structure rather than used as a black box.

Tianyun Tang

Convex relaxation approaches for high-dimensional optimal transport

Optimal transport (OT) is a powerful tool in mathematics and data science but faces severe computational and statistical challenges in high dimensions. We propose convex relaxation approaches based on marginal and cluster moment relaxations that exploit locality and correlative sparsity in the distributions. These methods approximate high-dimensional couplings using low-order marginals and sparse moment statistics, yielding semidefinite programs that provide lower bounds on the OT cost with greatly reduced complexity. For Gaussian distributions with sparse correlations, we prove reductions in both computational and sample complexity, and experiments show the approach also works well for non-Gaussian cases. In addition, we demonstrate how to extract transport maps from our relaxations, offering a simpler and interpretable alternative to neural networks in generative modeling. Our results suggest that convex relaxations can provide a promising path for dimension reduction in high-dimensional OT.

Xiaochuan Tian

Sparse RBF Networks for PDEs and nonlocal equations

We present a unified sparse radial basis function (RBF) network framework for solving nonlinear PDEs and nonlocal equations, including fractional-order operators. The method combines kernel-based representations with sparsity-promoting regularization and adaptive-width shallow networks, producing compact and interpretable solutions. The formulation is grounded in a reproducing kernel Banach space (also known as variation spaces) induced by one-hidden-layer networks of infinite width. A representer theorem guarantees the existence of finite sparse solutions and yields error bounds that connect the method to classical numerical analysis. For a broad class of radial kernels, we prove that the associated solution space admits a unified characterization as a Besov space, largely independent of the specific kernel choice. The explicit kernel structure enables quasi-analytical evaluation of differential and nonlocal operators, including fractional Laplacians. Computationally, the method is implemented via a three-phase adaptive algorithm consisting of feature discovery, second-order optimization, and pruning. Numerical experiments on high-order PDEs, fractional equations, and Eikonal equations illustrate the trade-offs among accuracy, sparsity, and computational cost, and demonstrate the broad applicability of the proposed framework.

Richard Tsai

Residual minimization methods for solving PDEs using neural networks

Artificial neural networks and modern accelerators provide a platform for developing mesh-free approaches to solving partial differential equations, particularly in high-dimensional settings where classical grid-based methods become infeasible. These methods typically reduce PDE solving to a nonlinear optimization problem: adjust the parameters of a network function to minimize the residual of the differential operator. This residual minimization viewpoint is the subject of the talk.We begin by establishing necessary conditions under which residual minimization can recover the solution of a well-posed initial-boundary value problem. Through elementary examples, we examine what goes wrong when these conditions fail and why the resulting optimization landscapes give rise to loss-function fallacies — situations where small training loss does not imply small solution error. In the second half, we show how classical numerical principles both diagnose these pitfalls and suggest superior alternatives, through two applications: (1) Hamilton-Jacobi equations, where neural solvers struggle with viscosity solutions in high dimensions, and (2) integral equations, where the dense, singular matrices arising from quadrature demand structure-aware inversion. The talk aims to give graduate students a framework for deciding when to trust neural solvers and how to use classical theory to build more robust computational tools.

Jack Xin

Stochastic Interacting Particle Methods and Generative Learning for Multiscale PDEs

Multiscale time dependent partial differential equations (PDE) are challenging to compute by mesh based methods especially when their solutions develop large gradients or concentrations at unknown locations. We discuss stochastic interacting particle (SIP) methods for advection-diffusion-reaction PDEs based on probabilistic representations of solutions, and show their self-adaptivity and efficiency in several space dimensions. Using SIP solutions as training data, we compare generative AI models (optimal transport, diffusion, flow-matching and one-step diffusion) in learning, interpolating and predicting solutions as physical parameters vary.

Wuzhe Xu

Diffusion Models for Scientific Computing

Diffusion models have shown growing promise in scientific computing. In this talk, I will first discuss a line of work on diffusion-based scientific data enhancement, including correction, unpaired super-resolution, and physics-guided downscaling for PDE-related problems. These methods demonstrate the effectiveness of diffusion models across a wide range of scientific settings, but they remain largely tailored to fixed tasks or target distributions. I will then turn to in-context learning for diffusion models over spaces of probability distributions. Given a context set of samples from a previously unseen task distribution, we use a transformer-based neural operator to generate additional samples without retraining or task-specific adaptation. I will discuss scaling laws for this framework and show both its effectiveness and its limitations.