AbstractMany real-world systems are complex, consisting of many entities with interactions among them. Our understanding of real-world complex systems has been significantly advanced by modelling these systems as networks. A network is a mathematical ab- straction of complex systems, representing entities and interactions by nodes and edges. Recent years have witnessed a rapid growth in the demand for analysing networks data, driven by the increased availability of large-scale, quality datasets. A common task in network analysis is to identify the “building blocks” of a network by finding divisions of nodes, such that nodes in the same division connect with the rest of the network in a similar way. This task is often referred to as community detection in networks. Community detection methods allow researchers to characterise network data from the perspective of connection pattern, which could convey important information about the functional and evolutionary mechanism of the underlying systems.
Recently, Bayesian inference based on generative network model has attracted great attention as a community detection method, which is mainly due to its principle infer- ence nature and formal implementation of the Occam’s razor. However, this method often relies on general models that simultaneously account for different kinds of com- munity structure. If the dominant structure in data is in fact restricted and simple, using general models could lead to sub-optimal fit to data.
This thesis concerns with developing Bayesian inference community detection methods that are tailored for a particular kind of structure - the assortative structure. A net- work is said to be assortative if it can be divided into subgroups of nodes, such that connections inside each of division are dense while between distinct divisions are sparse. To this end, we develop the Bayesian formulation of the degree-corrected planted par- tition model. Such model assumes the probability of an edge between a pair of nodes is dependant on whether they are from the same subgroups as well as their node- wise propensity of receiving an edge. This formulation leads to a novel method for extracting assortative structures and this method is one of the main contributions of this thesis. Compared with other existing methods, our proposed method has the ad- vantage of being robust against overfitting, which means our method will not report spurious community structures in random networks while other non-statistical, heuris- tic methods usually do. In deriving our proposed method, we clarify on an established equivalence between the popular modularity maximisation approach and maximum likelihood inference. Our analysis shows that the equivalence result is tenuous, since it relies on subjective choices of model parameters which lack of principle justifications.
We demonstrate the performance of our proposed method in both synthetic and empir- ical networks. In particular, we construct a large network corpus consisting of datasets which are diverse in terms of size and density. Using this network corpus, we find evidence that the degree-corrected planted partition model has the ability of achiev- ing better quality of fit in some empirical networks compared to existing models in some cases,. Moreover, the degree-corrected planted partition model has the potential of providing additional insight into data regarding high-resolution community struc- ture. Moreover, by conducting model selection in our network corpus, we find that assortativity is often too simplistic to be the dominant pattern in empirical networks.
Finally, we study the detectability of assortative community structures. In networks where all nodes receive identical number of edges on average, there exists a detectabil- ity threshold of the strength of community structure, below which no polynomial algo- rithms can detect the planted community structure better than random guessing. We conduct a numerical study to examine the effect of heterogeneity in the number of edges attaching to nodes on the detectability of assortative structures. Such effect has been analytically studied in a special case where networks have two equal-size communities. Our results provide further numerical evidence for the existing theoretical analysis and open the door to investigation about the detectability of community structures in more general settings, e.g. in networks consisting of more than two communities, which could have different extents of heterogeneity in degree distribution.
|Date of Award||22 Feb 2023|
|Supervisor||Tiago De Paula Peixoto (Supervisor) & Matthew Nunes (Supervisor)|
- network analysis
- Bayesian inference
- clustering algorithms
- machine learning