Source author record

Xiaoyue Niu

Xiaoyue Niu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications math.ST Methodology Statistics Theory Machine Learning Social and Information Networks

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Perfect Spectral Clustering with Discrete Covariates

Among community detection methods, spectral clustering enjoys two desirable properties: computational efficiency and theoretical guarantees of consistency. Most studies of spectral clustering consider only the edges of a network as input to the algorithm. Here we consider the problem of performing community detection in the presence of discrete node covariates, where network structure is determined by a combination of a latent block model structure and homophily on the observed covariates. We propose a spectral algorithm that we prove achieves perfect clustering with high probability on a class of large, sparse networks with discrete covariates, effectively separating latent network structure from homophily on observed covariates. To our knowledge, our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering in the setting where edge formation is dependent on both latent and observed factors.

preprint2020arXiv

Evaluating the relative contribution of data sources in a Bayesian analysis with the application of estimating the size of hard to reach populations

When using multiple data sources in an analysis, it is important to understand the influence of each data source on the analysis and the consistency of the data sources with each other and the model. We suggest the use of a retrospective value of information framework in order to address such concerns. Value of information methods can be computationally difficult. We illustrate the use of computational methods that allow these methods to be applied even in relatively complicated settings. In illustrating the proposed methods, we focus on an application in estimating the size of hard to reach populations. Specifically, we consider estimating the number of injection drug users in Ukraine by combining all available data sources spanning over half a decade and numerous sub-national areas in the Ukraine. This application is of interest to public health researchers as this hard to reach population that plays a large role in the spread of HIV. We apply a Bayesian hierarchical model and evaluate the contribution of each data source in terms of absolute influence, expected influence, and level of surprise. Finally we apply value of information methods to inform suggestions on future data collection.

preprint2020arXiv

What Can We Learn from the Travelers Data in Detecting Disease Outbreaks -- A Case Study of the COVID-19 Epidemic

Background: Travel is a potent force in the emergence of disease. We discussed how the traveler case reports could aid in a timely detection of a disease outbreak. Methods: Using the traveler data, we estimated a few indicators of the epidemic that affected decision making and policy, including the exponential growth rate, the doubling time, and the probability of severe cases exceeding the hospital capacity, in the initial phase of the COVID-19 epidemic in multiple countries. We imputed the arrival dates when they were missing. We compared the estimates from the traveler data to the ones from domestic data. We quantitatively evaluated the influence of each case report and knowing the arrival date on the estimation. Findings: We estimated the travel origin's daily exponential growth rate and examined the date from which the growth rate was consistently above 0.1 (equivalent to doubling time < 7 days). We found those dates were very close to the dates that critical decisions were made such as city lock-downs and national emergency announcement. Using only the traveler data, if the assumed epidemic start date was relatively accurate and the traveler sample was representative of the general population, the growth rate estimated from the traveler data was consistent with the domestic data. We also discussed situations that the traveler data could lead to biased estimates. From the data influence study, we found more recent travel cases had a larger influence on each day's estimate, and the influence of each case report got smaller as more cases became available. We provided the minimum number of exported cases needed to determine whether the local epidemic growth rate was above a certain level, and developed a user-friendly Shiny App to accommodate various scenarios.

preprint2016arXiv

Incorporating Hierarchical Structure Into Dynamic Systems: An Application Of Estimating HIV Epidemics At Sub-National And Sub-Population Level

Dynamic models have been successfully used in producing estimates of HIV epidemics at national level, due to their epidemiological nature and their ability to simultaneously estimate prevalence, incidence, and mortality rates. Recently, HIV interventions and policies have required more information at sub-national and sub-population levels to support local planning, decision making and resource allocation. Unfortunately, many areas and high-risk groups lack sufficient data for deriving stable and reliable results, and this is a critical technical barrier to more stratified estimates. One solution is to borrow information from other areas and groups within the same country. However, directly assuming hierarchical structures within the HIV dynamic models is complicated and computationally time consuming. In this paper, we propose a simple and innovative way to incorporate the hierarchical information into the dynamic systems by using auxiliary data. The proposed method efficiently uses information from multiple areas and risk groups within each country without increasing the computational burden. As a result, the new model improves predictive ability in general with especially significant improvement in areas and risk groups with sparse data.

preprint2015arXiv

Estimating HIV Epidemics for Sub-National Areas

As the global HIV pandemic enters its fourth decade, increasing numbers of surveillance sites have been established which allows countries to look into the epidemics at a finer scale, e.g. at sub-national levels. Currently, the epidemic models have been applied independently to the sub-national areas within countries. However, the availability and quality of the data vary widely, which leads to biased and unreliable estimates for areas with very few data. We propose to overcome this issue by introducing the dependence of the parameters across areas in a mixture model. The joint distribution of the parameters in multiple areas can be approximated directly from the results of independent fits without needing to refit the data or unpack the software. As a result, the mixture model has better predictive ability than the independent model as shown in examples of multiple countries in Sub-Saharan Africa.

preprint2014arXiv

A Hierarchical Model for Estimating HIV Epidemics

As the global HIV pandemic enters its fourth decade, increasing numbers of surveillance sites have been established which allows countries to look into the epidemics at a finer scale, e.g. at sub-national level. However, the epidemic models have been applied independently to the sub-national areas within countries. An important technical barrier is that the availability and quality of the data vary widely from area to area, and many areas lack data for deriving stable and reliable results. To improve the accuracy of the results in areas with little data, we propose a hierarchical model that utilizes information efficiently by assuming similar characteristics of the epidemics across areas within one country. The joint distribution of the parameters in the hierarchical model can be approximated directly from the results of independent fits without needing to the refit the data. As a result, the hierarchical model has better predictive ability than the independent model as shown in examples of multiple countries in Sub-Saharan Africa.

preprint2014arXiv

Information bounds for Gaussian copulas

Often of primary interest in the analysis of multivariate data are the copula parameters describing the dependence among the variables, rather than the univariate marginal distributions. Since the ranks of a multivariate dataset are invariant to changes in the univariate marginal distributions, rank-based estimators are natural candidates for semiparametric copula estimation. Asymptotic information bounds for such estimators can be obtained from an asymptotic analysis of the rank likelihood, that is, the probability of the multivariate ranks. In this article, we obtain limiting normal distributions of the rank likelihood for Gaussian copula models. Our results cover models with structured correlation matrices, such as exchangeable or circular correlation models, as well as unstructured correlation matrices. For all Gaussian copula models, the limiting distribution of the rank likelihood ratio is shown to be equal to that of a parametric likelihood ratio for an appropriately chosen multivariate normal model. This implies that the semiparametric information bounds for rank-based estimators are the same as the information bounds for estimators based on the full data, and that the multivariate normal distributions are least favorable.

preprint2011arXiv

A covariance regression model

Classical regression analysis relates the expectation of a response variable to a linear combination of explanatory variables. In this article, we propose a covariance regression model that parameterizes the covariance matrix of a multivariate response vector as a parsimonious quadratic function of explanatory variables. The approach is analogous to the mean regression model, and is similar to a factor analysis model in which the factor loadings depend on the explanatory variables. Using a random-effects representation, parameter estimation for the model is straightforward using either an EM-algorithm or an MCMC approximation via Gibbs sampling. The proposed methodology provides a simple but flexible representation of heteroscedasticity across the levels of an explanatory variable, improves estimation of the mean function and gives better calibrated prediction regions when compared to a homoscedastic model.

Xiaoyue Niu

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Perfect Spectral Clustering with Discrete Covariates

Evaluating the relative contribution of data sources in a Bayesian analysis with the application of estimating the size of hard to reach populations

What Can We Learn from the Travelers Data in Detecting Disease Outbreaks -- A Case Study of the COVID-19 Epidemic

Incorporating Hierarchical Structure Into Dynamic Systems: An Application Of Estimating HIV Epidemics At Sub-National And Sub-Population Level

Estimating HIV Epidemics for Sub-National Areas

A Hierarchical Model for Estimating HIV Epidemics

Information bounds for Gaussian copulas

A covariance regression model