The ideas in this talk represent a fusion of two different strands of research from the early 2000s: NLDR (nonlinear dimensionality reduction) and PCT (point-cloud topology). The two fields meet in the problem of finding circular coordinates for a data set. This new technology has possible applications in signal processing and dynamical systems.
The idea behind NLDR is to take a high-dimensional data set, perhaps obtained as a collection of scientific measurements, and to find a small set of real-valued coordinates that reveal meaningful parameters of the data. The classical linear instance of this is principal components analysis (PCA). The paradigm was introduced by Josh Tenenbaum in the late 1990s. Two well-known algorithms, Isomap (Tenenbaum, dS, Langford) and LLE (Roweis, Saul) were published in 2000, and many other researchers have published NLDR algorithms since then. Each algorithm exploits a different aspect of the inherent geometry of the data, in order to construct the coordinates.
Roughly over the same time period, several groups of researchers began developing tools and techniques for applying algebraic topology to scientific data. Here the idea is to detect the topological structure of a set of high-dimensional observed data points. The difficulty is that data are inherently noisy, and topological invariants are extremely sensitive to local noise. The early breakthrough came in 2000, with the publication of the persistence algorithm of Edelsbrunner, Letscher and Zomorodian. This new framework gives robust versions of the classical invariants of algebraic topology (such as homology and betti numbers), that can be used to estimate the topology (or "shape") of a noisy data set.
The two fields meet in the following way. From the NLDR side, one can generalize from real-valued coordinates to more general coordinates. We focus on circle-valued coordinates (such as angles). To discover these coordinates, we exploit not the geometry but the topology of the data. In order to do this robustly, it is necessary to use a persistence framework. I will indicate how these calculations are carried out, and give some examples of how one can exploit the resulting coordinates to empirically study time-series data and dynamical systems.
My collaborators in this work are Mikael Vejdemo-Johansson, Dmitriy Morozov, and Primoz Skraba.
Vin de Silva is mathematician specializing in applications of geometry and topology. Since the turn of the millennium, there has been a tremendous surge of interest in solving data analysis problems using classical ideas in geometry and algebraic topology. This has resulted in a richness of new mathematical technologies for studying scientific data sets. Dr de Silva has had the good fortune to work with many fine researchers around the world, publishing research in machine learning, topological data analysis, sensor networks, tensor decomposition, and persistent topology. He believes that a little geometric insight can sometimes go a long way.