Posted June 2007
Data in high dimensions is becoming ubiquitous, from image analysis and finances to computational biology and neuroscience.
This data is often given or represented as samples embedded in a high dimensional Euclidean space, point cloud data, though it is assumed to belong to lower dimensional manifolds. Thus, in recent years, there have been significant efforts in the development of methods to analyze these point clouds and their underlying manifolds. These include numerous techniques for the estimation of the intrinsic dimension of the data and also its projection onto lower dimensional representations. These disciplines are often called manifold learning and dimensionality reduction.
The vast majority of the techniques developed in the literature assume, either explicitly or implicitly, that the given point cloud are samples of a unique manifold. It is very easy to realize that a significant part of the interesting data has mixed dimensionality and complexity. That is, we have samples not of a manifold but of a stratification.
In these cases it is useful to cluster the data according to the complexity (dimensionality) of the underlying possible multiple manifolds (see example in figure above). Such clustering can be used both to better understand the varying dimensionality and complexity of the data, e.g., states in neural recordings or different human activities for video analysis, or as a pre-processing step for some manifold learning and dimensionality reduction and dimensionality reduction techniques.
IMA postdoc Gloria Haro together with IMA long term visitors Gregory Randall and Guillermo Sapiro have proposed a technique for stratification learning. The method is based on a mixture of Poisson distributions that locally model the counting process of points in the different manifolds. Their technique automatically gives a soft clustering of the point cloud according to dimensionality and density, with an estimation of both quantities for each class. The figure below illustrate two applications of the technique in computer vision, first to the recognition of digits, and second to the classification of activities recorded in a video.