Large Data Limits and Scaling Laws of t-SNE

Data Science Seminar

Adam Pickarski
North Carolina State University

Abstract

We consider large-data asymptotics for t-distributed stochastic neighbor embedding (tSNE), a widely-used non-linear dimension reduction algorithm. We identify an appropriate continuum limit of the tSNE objective function, which can be viewed as a combination of a kernel-based repulsion and an asymptotically-vanishing Laplacian-type regularizer. As a consequence, we show that embeddings of the original tSNE algorithm cannot have any consistent limit as $n \to \infty$. We propose a rescaled model which mitigates the asymptotic decay of the attractive energy, and which does have a consistent limit. We identify algorithms from this limit and provide empirical results. We also propose an alternative large data limit where we do not alter the model, but the scale of the plots. We investigate the well posedness of this limit in a 1 dimensional setting and identify necessary conditions in higher dimensions.

Large Data Limits and Scaling Laws of t-SNE

Abstract

Share