Source author record

Huiyan Sang

Huiyan Sang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Machine Learning Methodology

Catalog footprint

What is connected

4works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Why the Rich Get Richer? On the Balancedness of Random Partition Models

Random partition models are widely used in Bayesian methods for various clustering tasks, such as mixture models, topic models, and community detection problems. While the number of clusters induced by random partition models has been studied extensively, another important model property regarding the balancedness of partition has been largely neglected. We formulate a framework to define and theoretically study the balancedness of exchangeable random partition models, by analyzing how a model assigns probabilities to partitions with different levels of balancedness. We demonstrate that the "rich-get-richer" characteristic of many existing popular random partition models is an inevitable consequence of two common assumptions: product-form exchangeability and projectivity. We propose a principled way to compare the balancedness of random partition models, which gives a better understanding of what model works better and what doesn't for different applications. We also introduce the "rich-get-poorer" random partition models and illustrate their application to entity resolution tasks.

preprint2021arXiv

Risk Based Arsenic Rational Sampling Design for Public and Environmental Health Management

Groundwater contaminated with arsenic has been recognized as a global threat, which negatively impacts human health. Populations that rely on private wells for their drinking water are vulnerable to the potential arsenic-related health risks such as cancer and birth defects. Arsenic exposure through drinking water is among one of the primary arsenic exposure routes that can be effectively managed by active testing and water treatment. From the public and environmental health management perspective, it is critical to allocate the limited resources to establish an effective arsenic sampling and testing plan for health risk mitigation. We present a spatially adaptive sampling design approach based on an estimation of the spatially varying underlying contamination distribution. The method is different from traditional sampling design methods that often rely on a spatially constant or smoothly varying contamination distribution. In contrast, we propose a statistical regularization method to automatically detect spatial clusters of the underlying contamination risk from the currently available private well arsenic testing data in the USA, Iowa. This approach allows us to develop a sampling design method that is adaptive to the changes in the contamination risk across the identified clusters. We provide the spatially adaptive sample size calculation and sampling location determination at different acceptance precision and confidence levels for each cluster. The spatially adaptive sampling approach may effectively mitigate the arsenic risk from the resource management perspectives. The model presents a framework that can be widely used for other environmental contaminant monitoring and sampling for public and environmental health.

preprint2015arXiv

Cognitive Learning of Statistical Primary Patterns via Bayesian Network

In cognitive radio (CR) technology, the trend of sensing is no longer to only detect the presence of active primary users. A large number of applications demand for more comprehensive knowledge on primary user behaviors in spatial, temporal, and frequency domains. To satisfy such requirements, we study the statistical relationship among primary users by introducing a Bayesian network (BN) based framework. How to learn such a BN structure is a long standing issue, not fully understood even in the statistical learning community. Besides, another key problem in this learning scenario is that the CR has to identify how many variables are in the BN, which is usually considered as prior knowledge in statistical learning applications. To solve such two issues simultaneously, this paper proposes a BN structure learning scheme consisting of an efficient structure learning algorithm and a blind variable identification scheme. The proposed approach incurs significantly lower computational complexity compared with previous ones, and is capable of determining the structure without assuming much prior knowledge about variables. With this result, cognitive users could efficiently understand the statistical pattern of primary networks, such that more efficient cognitive protocols could be designed across different network layers.

preprint2012arXiv

Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors

This paper investigates the cross-correlations across multiple climate model errors. We build a Bayesian hierarchical model that accounts for the spatial dependence of individual models as well as cross-covariances across different climate models. Our method allows for a nonseparable and nonstationary cross-covariance structure. We also present a covariance approximation approach to facilitate the computation in the modeling and analysis of very large multivariate spatial data sets. The covariance approximation consists of two parts: a reduced-rank part to capture the large-scale spatial dependence, and a sparse covariance matrix to correct the small-scale dependence error induced by the reduced rank approximation. We pay special attention to the case that the second part of the approximation has a block-diagonal structure. Simulation results of model fitting and prediction show substantial improvement of the proposed approximation over the predictive process approximation and the independent blocks analysis. We then apply our computational approach to the joint statistical modeling of multiple climate model errors.