Source author record

James D. Wilson

James D. Wilson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Social and Information Networks Methodology Machine Learning stat.OT

Catalog footprint

What is connected

8works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Analysis of Population Functional Connectivity Data via Multilayer Network Embeddings

Population analyses of functional connectivity have provided a rich understanding of how brain function differs across time, individual, and cognitive task. An important but challenging task in such population analyses is the identification of reliable features that describe the function of the brain, while accounting for individual heterogeneity. Our work is motivated by two particularly important challenges in this area: first, how can one analyze functional connectivity data over populations of individuals, and second, how can one use these analyses to infer group similarities and differences. Motivated by these challenges, we model population connectivity data as a multilayer network and develop the multi-node2vec algorithm, an efficient and scalable embedding method that automatically learns continuous node feature representations from multilayer networks. We use multi-node2vec to analyze resting state fMRI scans over a group of 74 healthy individuals and 60 patients with schizophrenia. We demonstrate how multilayer network embeddings can be used to visualize, cluster, and classify functional regions of the brain for these individuals. We furthermore compare the multilayer network embeddings of the two groups. We identify significant differences between the groups in the default mode network and salience network - findings that are supported by the triple network model theory of cognitive organization. Our findings reveal that multi-node2vec is a powerful and reliable method for analyzing multilayer networks.

preprint2020arXiv

Nonparametric Feature Impact and Importance

Practitioners use feature importance to rank and eliminate weak predictors during model development in an effort to simplify models and improve generality. Unfortunately, they also routinely conflate such feature importance measures with feature impact, the isolated effect of an explanatory variable on the response variable. This can lead to real-world consequences when importance is inappropriately interpreted as impact for business or medical insight purposes. The dominant approach for computing importances is through interrogation of a fitted model, which works well for feature selection, but gives distorted measures of feature impact. The same method applied to the same data set can yield different feature importances, depending on the model, leading us to conclude that impact should be computed directly from the data. While there are nonparametric feature selection algorithms, they typically provide feature rankings, rather than measures of impact or importance. They also typically focus on single-variable associations with the response. In this paper, we give mathematical definitions of feature impact and importance, derived from partial dependence curves, that operate directly on the data. To assess quality, we show that features ranked by these definitions are competitive with existing feature selection techniques using three real data sets for predictive tasks.

preprint2020arXiv

Technical Report: Partial Dependence through Stratification

Partial dependence curves (FPD) introduced by Friedman, are an important model interpretation tool, but are often not accessible to business analysts and scientists who typically lack the skills to choose, tune, and assess machine learning models. It is also common for the same partial dependence algorithm on the same data to give meaningfully different curves for different models, which calls into question their precision. Expertise is required to distinguish between model artifacts and true relationships in the data. In this paper, we contribute methods for computing partial dependence curves, for both numerical (StratPD) and categorical explanatory variables (CatStratPD), that work directly from training data rather than predictions of a model. Our methods provide a direct estimate of partial dependence, and rely on approximating the partial derivative of an unknown regression function without first fitting a model and then approximating its partial derivative. We investigate settings where contemporary partial dependence methods---including FPD, ALE, and SHAP methods---give biased results. Furthermore, we demonstrate that our approach works correctly on synthetic and plausibly on real data sets. Our goal is not to argue that model-based techniques are not useful. Rather, we hope to open a new line of inquiry into nonparametric partial dependence.

preprint2016arXiv

An overview and perspective on social network monitoring

In this expository paper we give an overview of some statistical methods for the monitoring of social networks. We discuss the advantages and limitations of various methods as well as some relevant issues. One of our primary contributions is to give the relationships between network monitoring methods and monitoring methods in engineering statistics and public health surveillance. We encourage researchers in the industrial process monitoring area to work on developing and comparing the performance of social network monitoring methods. We also discuss some of the issues in social network monitoring and give a number of research ideas.

preprint2016arXiv

Modeling and detecting change in temporal networks via a dynamic degree corrected stochastic block model

In many applications it is of interest to identify anomalous behavior within a dynamic interacting system. Such anomalous interactions are reflected by structural changes in the network representation of the system. We propose and investigate the use of a dynamic version of the degree corrected stochastic block model (DCSBM) to model and monitor dynamic networks that undergo a significant structural change. We apply statistical process monitoring techniques to the estimated parameters of the DCSBM to identify significant structural changes in the network. Application of our surveillance strategy to the dynamic U.S. Senate co-voting network reveals that we are able to detect significant changes in the network that reflect both times of cohesion and times of polarization among Republican and Democratic party members. These findings provide valuable insight about the evolution of the bipartisan political system in the United States. Our analysis demonstrates that the dynamic DCSBM monitoring procedure effectively detects local and global structural changes in dynamic networks. The DCSBM approach is an example of a more general framework that combines parametric random graph models and statistical process monitoring techniques for network surveillance.

preprint2016arXiv

Monitoring communication outbreaks among an unknown team of actors in dynamic networks

This paper investigates the detection of communication outbreaks among a small team of actors in time-varying networks. We propose monitoring plans for known and unknown teams based on generalizations of the exponentially weighted moving average (EWMA) statistic. For unknown teams, we propose an efficient neighborhood-based search to estimate a collection of candidate teams. This procedure dramatically reduces the computational complexity of an exhaustive search. Our procedure consists of two steps: communication counts between actors are first smoothed using a multivariate EWMA strategy. Densely connected teams are identified as candidates using a neighborhood search approach. These candidate teams are then monitored using a surveillance plan derived from a generalized EWMA statistic. Monitoring plans are established for collaborative teams, teams with a dominant leader, as well as for global outbreaks. We consider weighted heterogeneous dynamic networks, where the expected communication count between each pair of actors is potentially different across pairs and time, as well as homogeneous networks, where the expected communication count is constant across time and actors. Our monitoring plans are evaluated on a test bed of simulated networks as well as on the U.S. Senate co-voting network, which models the Senate voting patterns from 1857 to 2015. Our analysis suggests that our surveillance strategies can efficiently detect relevant and significant changes in dynamic networks.

preprint2016arXiv

Stochastic Weighted Graphs: Flexible Model Specification and Simulation

In most domains of network analysis researchers consider networks that arise in nature with weighted edges. Such networks are routinely dichotomized in the interest of using available methods for statistical inference with networks. The generalized exponential random graph model (GERGM) is a recently proposed method used to simulate and model the edges of a weighted graph. The GERGM specifies a joint distribution for an exponential family of graphs with continuous-valued edge weights. However, current estimation algorithms for the GERGM only allow inference on a restricted family of model specifications. To address this issue, we develop a Metropolis--Hastings method that can be used to estimate any GERGM specification, thereby significantly extending the family of weighted graphs that can be modeled with the GERGM. We show that new flexible model specifications are capable of avoiding likelihood degeneracy and efficiently capturing network structure in applications where such models were not previously available. We demonstrate the utility of this new class of GERGMs through application to two real network data sets, and we further assess the effectiveness of our proposed methodology by simulating non-degenerate model specifications from the well-studied two-stars model. A working R version of the GERGM code is available in the supplement and will be incorporated in the gergm CRAN package.

preprint2014arXiv

A testing based extraction algorithm for identifying significant communities in networks

A common and important problem arising in the study of networks is how to divide the vertices of a given network into one or more groups, called communities, in such a way that vertices of the same community are more interconnected than vertices belonging to different ones. We propose and investigate a testing based community detection procedure called Extraction of Statistically Significant Communities (ESSC). The ESSC procedure is based on $p$-values for the strength of connection between a single vertex and a set of vertices under a reference distribution derived from a conditional configuration network model. The procedure automatically selects both the number of communities in the network and their size. Moreover, ESSC can handle overlapping communities and, unlike the majority of existing methods, identifies "background" vertices that do not belong to a well-defined community. The method has only one parameter, which controls the stringency of the hypothesis tests. We investigate the performance and potential use of ESSC and compare it with a number of existing methods, through a validation study using four real network data sets. In addition, we carry out a simulation study to assess the effectiveness of ESSC in networks with various types of community structure, including networks with overlapping communities and those with background vertices. These results suggest that ESSC is an effective exploratory tool for the discovery of relevant community structure in complex network systems. Data and software are available at \urlhttp://www.unc.edu/~jameswd/research.html.

James D. Wilson

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Analysis of Population Functional Connectivity Data via Multilayer Network Embeddings

Nonparametric Feature Impact and Importance

Technical Report: Partial Dependence through Stratification

An overview and perspective on social network monitoring

Modeling and detecting change in temporal networks via a dynamic degree corrected stochastic block model

Monitoring communication outbreaks among an unknown team of actors in dynamic networks

Stochastic Weighted Graphs: Flexible Model Specification and Simulation

A testing based extraction algorithm for identifying significant communities in networks