Source author record

Bailey K. Fosdick

Bailey K. Fosdick appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Populations and Evolution Social and Information Networks

Catalog footprint

What is connected

6works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Regression of exchangeable relational arrays

Relational arrays represent measures of association between pairs of actors, often in varied contexts or over time. Trade flows between countries, financial transactions between individuals, contact frequencies between school children in classrooms, and dynamic protein-protein interactions are all examples of relational arrays. Elements of a relational array are often modeled as a linear function of observable covariates. Uncertainty estimates for regression coefficient estimators -- and ideally the coefficient estimators themselves -- must account for dependence between elements of the array (e.g. relations involving the same actor) and existing estimators of standard errors that recognize such relational dependence rely on estimating extremely complex, heterogeneous structure across actors. This paper develops a new class of parsimonious coefficient and standard error estimators for regressions of relational arrays. We leverage an exchangeability assumption to derive standard error estimators that pool information across actors and are substantially more accurate than existing estimators in a variety of settings. This exchangeability assumption is pervasive in network and array models in the statistics literature, but not previously considered when adjusting for dependence in a regression setting with relational data. We demonstrate improvements in inference theoretically, via a simulation study, and by analysis of a data set involving international trade.

preprint2019arXiv

Inferring Influence Networks from Longitudinal Bipartite Relational Data

Longitudinal bipartite relational data characterize the evolution of relations between pairs of actors, where actors are of two distinct types and relations exist only between disparate types. A common goal is to understand the temporal dependencies, specifically which actor relations incite later actor relations. There are two existing approaches to this problem. The first approach projects the bipartite data in each time period to a unipartite network and uses existing unipartite network models. Unfortunately, information is lost in calculating the projection and generative models for networks obtained through this process are scarce. The second approach represents dependencies using two unipartite \emph{influence networks}, corresponding to the two actor types. Existing models taking this approach are bilinear in the influence networks, creating challenges in computation and interpretation. We propose a novel generative model that permits estimation of weighted, directed influence networks and does not suffer from these shortcomings. The proposed model is linear in the influence networks, permitting inference using off-the-shelf software tools. We prove our estimator is consistent under cases of model misspecification and nearly asymptotically equivalent to the bilinear estimator. We demonstrate the performance of the proposed model in simulation studies and an analysis of weekly international state interactions.

preprint2015arXiv

Categorical Data Fusion Using Auxiliary Information

In data fusion analysts seek to combine information from two databases comprised of disjoint sets of individuals, in which some variables appear in both databases and other variables appear in only one database. Most data fusion techniques rely on variants of conditional independence assumptions. When inappropriate, these assumptions can result in unreliable inferences. We propose a data fusion technique that allows analysts to easily incorporate auxiliary information on the dependence structure of variables not observed jointly; we refer to this auxiliary information as glue. With this technique, we fuse two marketing surveys from the book publisher HarperCollins using glue from the online, rapid-response polling company CivicScience. The fused data enable estimation of associations between people's preferences for authors and for learning about new books. The analysis also serves as a case study on the potential for using online surveys to aid data fusion.

preprint2014arXiv

Separable factor analysis with applications to mortality data

Human mortality data sets can be expressed as multiway data arrays, the dimensions of which correspond to categories by which mortality rates are reported, such as age, sex, country and year. Regression models for such data typically assume an independent error distribution or an error model that allows for dependence along at most one or two dimensions of the data array. However, failing to account for other dependencies can lead to inefficient estimates of regression parameters, inaccurate standard errors and poor predictions. An alternative to assuming independent errors is to allow for dependence along each dimension of the array using a separable covariance model. However, the number of parameters in this model increases rapidly with the dimensions of the array and, for many arrays, maximum likelihood estimates of the covariance parameters do not exist. In this paper, we propose a submodel of the separable covariance model that estimates the covariance matrix for each dimension as having factor analytic structure. This model can be viewed as an extension of factor analysis to array-valued data, as it uses a factor model to estimate the covariance along each dimension of the array. We discuss properties of this model as they relate to ordinary factor analysis, describe maximum likelihood and Bayesian estimation methods, and provide a likelihood ratio testing procedure for selecting the factor model ranks. We apply this methodology to the analysis of data from the Human Mortality Database, and show in a cross-validation experiment how it outperforms simpler methods. Additionally, we use this model to impute mortality rates for countries that have no mortality data for several years. Unlike other approaches, our methodology is able to estimate similarities between the mortality rates of countries, time periods and sexes, and use this information to assist with the imputations.

preprint2013arXiv

Testing and Modeling Dependencies Between a Network and Nodal Attributes

Network analysis is often focused on characterizing the dependencies between network relations and node-level attributes. Potential relationships are typically explored by modeling the network as a function of the nodal attributes or by modeling the attributes as a function of the network. These methods require specification of the exact nature of the association between the network and attributes, reduce the network data to a small number of summary statistics, and are unable provide predictions simultaneously for missing attribute and network information. Existing methods that model the attributes and network jointly also assume the data are fully observed. In this article we introduce a unified approach to analysis that addresses these shortcomings. We use a latent variable model to obtain a low dimensional representation of the network in terms of node-specific network factors and use a test of dependence between the network factors and attributes as a surrogate for a test of dependence between the network and attributes. We propose a formal testing procedure to determine if dependencies exists between the network factors and attributes. We also introduce a joint model for the network and attributes, for use if the test rejects, that can capture a variety of dependence patterns and be used to make inference and predictions for missing observations.

preprint2012arXiv

Regional Probabilistic Fertility Forecasting by Modeling Between-Country Correlations

The United Nations (UN) Population Division is considering producing probabilistic projections for the total fertility rate (TFR) using the Bayesian hierarchical model of Alkema et al. (2011), which produces predictive distributions of TFR for individual countries. The UN is interested in publishing probabilistic projections for aggregates of countries, such as regions and trading blocs. This requires joint probabilistic projections of future country-specific TFRs, taking account of the correlations between them. We propose an extension of the Bayesian hierarchical model that allows for probabilistic projection of TFR for any set of countries. We model the correlation between country forecast errors as a linear function of time invariant covariates, namely whether the countries are contiguous, whether they had a common colonizer after 1945, and whether they are in the same UN region. The resulting correlation model is incorporated into the Bayesian hierarchical model's error distribution. We produce predictive distributions of TFR for 1990-2010 for each of the UN's primary regions. We find that the proportions of the observed values that fall within the prediction intervals from our method are closer to their nominal levels than those produced by the current model. Our results suggest that a significant proportion of the correlation between forecast errors for TFR in different countries is due to countries' geographic proximity to one another, and that if this correlation is accounted for, the quality of probabilitistic projections of TFR for regions and other aggregates is improved.