Multilevel graph-based methods for efficient data exploration [research project]

Funding agency

National Science Foundation (August 1, 2020 - July 31, 2023)

Investigators

Yousef Saad (professor)

Abstract

Graph theory helps scientists and engineers model various types of relations between entities in a set, whether members of a social network, or molecules in a chemical compound for example. Not surprisingly, with the advent of data-based methodologies that work by unraveling and exploiting relations between data items, graph theory tools are finding their way in a very broad range of applications. The primary goal of this project is to examine a class of methods that manipulate graphs, specifically by developing effective multilevel algorithms that take advantage of divide and conquer approaches. In multilevel techniques, smaller and smaller graphs are extracted from some original graph with the goal of keeping as much of its intrinsic information as possible. These smaller graphs are then employed instead of the original ones, resulting in significant gains in performance. This project addresses issues that are of great relevance to many current data-based methodologies and will be applicable across various disciplines. As such it will help promote interest in problems related to the current shift toward such methodologies because its research theme blends mathematical methods, innovations in algorithms, and applications. On the educational side, special courses and tutorials will be offered to graduate students from other disciplinary fields who wish to explore research in data sciences. This project will support one graduate student per year for each of the three years.

The rapid expansion of machine learning methodologies into a great variety of disciplines is pushing the demand for numerical methods that can effectively deal with large datasets. Among these methods, those based on graph representations of data figure prominently. The goal of this project is to develop effective multilevel algorithms that are rooted in graph theoretical approaches, for performing various machine learning tasks. A primary focus of the planned research is that of "graph coarsening", a technique whereby an original graph is substantially reduced in size by agglomerating nearby nodes together, to produce a faithful representative of the original graph. The project will exploit a class of methods based on multilevel coarsening, in which coarsening is applied recursively for a few levels. The ultimate goal of a multilevel coarsening approach is to make it possible to perform the heavy computations with the coarsened graph which is much smaller, resulting in much faster processing, with minimal loss in accuracy. Coarsening is an effective alternative to random sampling, a well-established method that consists of replacing the original data by a subset of its columns or rows that are selected at random or quasi- randomly. This project will study, both empirically and theoretically, various coarsening strategies. For example, coarsening will be studied from the angle of a projection method for approximating eigenvectors. Coarsening methods that try to preserve the eigenvectors exactly will also be studied. Among the many possible applications of graph coarsening the project will specifically consider their use in speeding up the training of a class of neural networks known as Graph Convolutional Networks (GCNs). A number of other research issues, all under the general theme of graph-based methods, will also be investigated. For example, the project will study how a form of hypergraph coarsening can be used to provide a solution to the "graph sparsification" problem, whereby a sparser version of a given graph is sought, or to the "column subset selection problem" which consists of selecting important rows (or columns) from a given data matrix.

More information on agency website

Multilevel graph-based methods for efficient data exploration

Keywords

computational mathematics, graph theory, machine learning

Share