"Network Comparisons Using Sample Splitting"
Three-Toed Sloth 2016-05-06
Summary:
My fifth Ph.D. student is defending his thesis towards the end of the month:
- Lawrence Wang, Network Comparisons Using Sample Splitting
- Abstract: Many scientific questions about networks are actually network comparison problems: Could two networks have reasonably come from a common source? Are there specific differences? We outline a procedure that tests the hypothesis that multiple networks were drawn from the same probabilistic source. In addition, when the networks are indeed different, our procedure may characterize the differences between the sources.
- We first address the case where the two networks being compared share the same exact nodes. We wish to use common parametric network models and the standard likelihood ratio test (LRT), but the infeasibility of computing the maximum likelihood estimate in our selected families of models complicates matters. However, we take advantage of the fact that the standard likelihood ratio test has a simple asymptotic distribution under a specific restriction of the model family. In addition, we show that a sample splitting approach is applicable: We can use part of the network data to choose an appropriate model space, and use the remaining network data to compute the LRT statistic and appeal to its asymptotic null distribution to obtain an appropriate p-value. Moreover, we show that while a single sample split results in a random p-value, we can choose to do multiple sample splits and aggregate the resulting individual p-values. Sample splitting is a more general framework --- nothing is particularly special about the specific hypothesis we decide to test. We illustrate a couple of extensions of the framework which also provide different ways to characterize differences in network models.
- We also address the more general case where the two networks being compared no longer share the same set of nodes. The main difficulty in this case is that there might not be an implicit alignment of the nodes in the two networks. Our procedure relies on the graphon model family which can handle networks of any size, but more importantly can be put in an aligned form which makes it comparable. We show that the framework for alignment can be generalized, which allows this method to handle a larger class of models.
- Time and place: 3:30 pm on Monday, 25 April 2016 in Porter Hall A22