Inferring Manifolds from Noisy Data: Non-Parametric Estimation and Random Walks in Shape Space
Data Science Seminar
Shira Faigenbaum Golovin (Duke University)
Abstract
High-dimensional data is increasingly available in data-driven applications, offering an opportunity for understanding its underlying structure. A common assumption is that such data lies on or near a low-dimensional manifold embedded within high-dimensional space, often contaminated by noise and outliers. In this talk, we will first introduce a non-parametric method for denoising and reconstructing a low-dimensional manifold from scattered high-dimensional data, along with a theoretical analysis of the convergence of the non-convex optimization problem and approximation order of the method.
Next, we will explore cases where the scattered data possesses its own intrinsic geometry (e.g., images or surfaces). In this setting, the goal shifts to inferring the geometry of a "manifold of manifolds" while preserving the internal geometry of individual data points themselves. By leveraging similarities within the shape space, we demonstrate how the geometry can be studied through random walks, not only on the manifold itself but also within its fibers. We will discuss the concept of the horizontal diffusion map, introduced by Gao (2021), which provides a nonlinear horizontal embedding to identify a new domain that captures meaningful geometric representations of data. Finally, we will demonstrate the effectiveness of this representation by presenting a novel method for registering anatomical surfaces, offering new insights into shape variation and the evolutionary processes of primates.