Source author record

Chun Li

Chun Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology cond-mat.mtrl-sci physics.comp-ph Computation Machine Learning

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Addressing Detection Limits with Semiparametric Cumulative Probability Models

Detection limits (DLs), where a variable is unable to be measured outside of a certain range, are common in research. Most approaches to handle DLs in the response variable implicitly make parametric assumptions on the distribution of data outside DLs. We propose a new approach to deal with DLs based on a widely used ordinal regression model, the cumulative probability model (CPM). The CPM is a type of semiparametric linear transformation model. CPMs are rank-based and can handle mixed distributions of continuous and discrete outcome variables. These features are key for analyzing data with DLs because while observations inside DLs are typically continuous, those outside DLs are censored and generally put into discrete categories. With a single lower DL, the CPM assigns values below the DL as having the lowest rank. When there are multiple DLs, the CPM likelihood can be modified to appropriately distribute probability mass. We demonstrate the use of CPMs with simulations and two HIV data examples. The first example models a biomarker in which 15% of observations are below a DL. The second uses multi-cohort data to model viral load, where approximately 55% of observations are outside DLs which vary across sites and over time.

preprint2022arXiv

Analyzing Clustered Continuous Response Variables with Ordinal Regression Models

Continuous response variables often need to be transformed to meet regression modeling assumptions; however, finding the optimal transformation is challenging and results may vary with the choice of transformation. When a continuous response variable is measured repeatedly for a subject or the continuous responses arise from clusters, it is more challenging to model the continuous response data due to correlation within clusters. We extend a widely used ordinal regression model, the cumulative probability model (CPM), to fit clustered continuous response variables based on generalized estimating equation (GEE) methods for ordinal responses. With our approach, estimates of marginal parameters, cumulative distribution functions (CDFs), expectations, and quantiles conditional on covariates can be obtained without pre-transformation of the potentially skewed continuous response data. Computational challenges arise with large numbers of distinct values of the continuous response variable, and we propose two feasible and computationally efficient approaches to fit CPMs for clustered continuous response variables with different working correlation structures. We study finite sample operating characteristics of the estimators via simulation, and illustrate their implementation with two data examples. One studies predictors of CD4:CD8 ratios in an HIV study. The other uses data from The Lung Health Study to investigate the contribution of a single nucleotide polymorphism to lung function decline.

preprint2022arXiv

Fitting Semiparametric Cumulative Probability Models for Big Data

Cumulative probability models (CPMs) are a robust alternative to linear models for continuous outcomes. However, they are not feasible for very large datasets due to elevated running time and memory usage, which depend on the sample size, the number of predictors, and the number of distinct outcomes. We describe three approaches to address this problem. In the divide-and-combine approach, we divide the data into subsets, fit a CPM to each subset, and then aggregate the information. In the binning and rounding approaches, the outcome variable is redefined to have a greatly reduced number of distinct values. We consider rounding to a decimal place and rounding to significant digits, both with a refinement step to help achieve the desired number of distinct outcomes. We show with simulations that these approaches perform well and their parameter estimates are consistent. We investigate how running time and peak memory usage are influenced by the sample size, the number of distinct outcomes, and the number of predictors. As an illustration, we apply the approaches to a large publicly available dataset investigating matrix multiplication runtime with nearly one million observations.

preprint2020arXiv

Multiferroic Decorated Fe2O3 Monolayer Predicted from First Principles

Two-dimensional (2D) multiferroics exhibit cross-control capacity between magnetic and electric responses in reduced spatial domain, making them well suited for next-generation nanoscale devices; however, progress has been slow in developing materials with required characteristic properties. Here we identify by first-principles calculations robust 2D multiferroic behaviors in decorated Fe2O3 monolayer, showcasing N@Fe2O3 as a prototypical case, where ferroelectricity and ferromagnetism stem from the same origin, namely Fe d-orbit splitting induced by the Jahn-Teller distortion and associated crystal field changes. The resulting ferromagnetic and ferroelectric polarization can be effectively reversed and regulated by applied electric field or strain, offering efficient functionality. These findings establish strong materials phenomena and elucidate underlying physics mechanism in a family of truly 2D multiferroics that are highly promising for advanced device applications.

preprint2020arXiv

Multinomial Random Forest: Toward Consistency and Privacy-Preservation

Despite the impressive performance of random forests (RF), its theoretical properties have not been thoroughly understood. In this paper, we propose a novel RF framework, dubbed multinomial random forest (MRF), to analyze the \emph{consistency} and \emph{privacy-preservation}. Instead of deterministic greedy split rule or with simple randomness, the MRF adopts two impurity-based multinomial distributions to randomly select a split feature and a split value respectively. Theoretically, we prove the consistency of the proposed MRF and analyze its privacy-preservation within the framework of differential privacy. We also demonstrate with multiple datasets that its performance is on par with the standard RF. To the best of our knowledge, MRF is the first consistent RF variant that has comparable performance to the standard RF.

preprint2017arXiv

Molecular Dynamics Simulations for Anisotropic Thermal Conductivity of Borophene

The present work carries out molecular dynamics simulations to compute the thermal conductivity of the borophene nanoribbon and the borophene nanotube using the Muller-Plathe approach. We investigate the thermal conductivity of the armchair and zigzag borophenes, and show the strong anisotropic thermal conductivity property of borophene. We compare the results of the borophene nanoribbon and the borophene nanotube, and find the thermal conductivity of the borophene is structure dependent.

Chun Li

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Addressing Detection Limits with Semiparametric Cumulative Probability Models

Analyzing Clustered Continuous Response Variables with Ordinal Regression Models

Fitting Semiparametric Cumulative Probability Models for Big Data

Multiferroic Decorated Fe2O3 Monolayer Predicted from First Principles

Multinomial Random Forest: Toward Consistency and Privacy-Preservation

Molecular Dynamics Simulations for Anisotropic Thermal Conductivity of Borophene