Large Data Limits and Scaling Laws of t-SNE
Data Science Seminar
Adam Pickarski
North Carolina State University
Abstract
We consider large-data asymptotics for t-distributed stochastic neighbor embedding (tSNE), a widely-used non-linear dimension reduction algorithm. We identify an appropriate continuum limit of the tSNE objective function, which can be viewed as a combination of a kernel-based repulsion and an asymptotically-vanishing Laplacian-type regularizer. As a consequence, we show that embeddings of the original tSNE algorithm cannot have any consistent limit as $n \to \infty$. We propose a rescaled model which mitigates the asymptotic decay of the attractive energy, and which does have a consistent limit. We identify algorithms from this limit and provide empirical results. We also propose an alternative large data limit where we do not alter the model, but the scale of the plots. We investigate the well posedness of this limit in a 1 dimensional setting and identify necessary conditions in higher dimensions.