Large data limit of the MBO scheme for data clustering

Data Science Seminar

Jona Lelmi (University of California, Los Angeles)

Abstract

The MBO scheme is a highly performant scheme used for data clustering. Given some data, one constructs the similarity graph associated to the data points. The goal is to split the data into meaningful clusters. The algorithm produces the clusters by alternating between diffusion on the graph and pointwise thresholding. In this talk I will present the first theoretical studies of the scheme in the large data limit. We will see how the final state of the algorithm is asymptotically related to minimal surfaces in the data manifold and how the dynamics of the scheme is asymptotically related to the trajectory of steepest descent for surfaces, which is mean curvature flow. The tools employed are variational methods and viscosity solutions techniques. Based on joint work with Tim Laux (U Bonn).

This recording was created before the current policy requirements took effect, and therefore may not be accessible. To request this content in an accessible format, contact [email protected].

Large data limit of the MBO scheme for data clustering

Abstract

Share