Hypergraph Analytics: Modeling Higher-order Structures and Probabilities [thesis]

Author

Ankit Sharma (Ph.D. 2020)

Abstract

Data structured in the form of overlapping or non-overlapping sets are found in a variety of domains, sometimes explicitly but often subtly. For example, teams, which are of prime importance in industry and social science studies are sets of individuals; item sets in pattern mining of customer transactions are sets, and for various types of analysis in language studies a sentence can be considered as a set or bag of words. Although building models and inference algorithms for structured data has been an essential task in the fields of machine learning and statistics, research on set-like data remains less explored. Relationships between pairs of elements can be modeled as edges in a graph. However, for modeling relationships that involve all members of a set, hyperedges in a Hypergraph are more natural representations. Hypergraphs are less known graph-theoretic structure as compared to graphs. Because of this popularity graphs have been employed prolifically to model data of all kinds. Little attention is given to the fact that whether the data is naturally being generated as dyadic interactions or not. We think that much data is even deliberately converted to a graph for the sake of fitting it into a graph-based model and destroying the precious information present when it was originally generated.

This thesis describes analyzing complex group structured data from domains like social networks, customer transaction data, and general categorical data, via the lens of Hypergraphs. To do so, we propose the Hypergraph Analytics Framework, under which we shall be interested in three higher-level questions pertaining to the hypergraph modeling. Firstly, how to model higher-order hypergraph information and what kind of lower-order approximations are available or sufficient depending upon the problem at hand. This question is addressed across the thesis as we employ different hypergraph models contingent upon the problem at hand.

Secondly, we shall be interested in understanding what kind of inferences are possible over the hypergraph structure and what kind of probabilities can be learned. For this, we shall be dissecting the problem of hypergraph inference into various hyperedge prediction sub-problems and developing inference methods for each of them. We develop inference methods for both cross-sectional analysis: when we ignore the time information about group interactions into account, as well as longitudinal analysis: where we leverage temporal data. We also develop separate methods for conducting inference over observed and unobserved regions of the hypergraph structure. This variety of inference mechanisms on hypergraph structure together constitute the first part of the thesis, which we refer to as the Spatial Analysis within our Hypergraph Analytics framework.

Lastly, we are interested in learning what kinds of compression algorithms are possible for hypergraphs and how effective these techniques are. Here we develop techniques to compress the hypergraph topology to lower-dimensional latent space. We shall be chiefly considering hyperedge compression or hyperedge embeddings. We examine two different embedding approaches. First, is an algebraic approach which leverages leverage the relationship between hypergraphs and symmetric higher-order tensors. Symmetric tensor decomposition techniques are then developed to learn embeddings. Second, is a neural networks based solution which employs auto-encoders regularized by hyper-graph structure. Together, both these approaches constitute the second part of the thesis, which we refer to as Spectral Analysis within the proposed Hypergraph Analytics framework.

Link to full paper

Hypergraph Analytics: Modeling Higher-order Structures and Probabilities

Keywords

data mining

Share