Large data limit of the MBO scheme for data clustering
Data Science Seminar
Jona Lelmi (University of California, Los Angeles)
The MBO scheme is a highly performant scheme used for data clustering. Given some data, one constructs the similarity graph associated to the data points. The goal is to split the data into meaningful clusters. The algorithm produces the clusters by alternating between diffusion on the graph and pointwise thresholding. In this talk I will present the first theoretical studies of the scheme in the large data limit. We will see how the final state of the algorithm is asymptotically related to minimal surfaces in the data manifold and how the dynamics of the scheme is asymptotically related to the trajectory of steepest descent for surfaces, which is mean curvature flow. The tools employed are variational methods and viscosity solutions techniques. Based on joint work with Tim Laux (U Bonn).