Researcher profile

Nan Liu

Nan Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Resolving the bias-precision paradox with stochastic causal representation learning for personalized medicine

Estimating individualized treatment effects from longitudinal observational data is central to data-driven medicine, yet existing methods face a fundamental limitation: reducing confounding bias often suppresses clinically informative heterogeneity, degrading patient-specific predictions. Here, we identify this tension as a bias-precision paradox in causal representation learning and introduce sampling-based maximum mean discrepancy (sMMD), a stochastic alignment strategy that replaces global adversarial balancing with subset-level matching. We instantiate this approach in a framework for counterfactual outcome prediction with attribution-grounded interpretability. Across two large-scale ICU cohorts (n = 27,783), our framework improves accuracy under distribution shift, reducing error by up to 11.5% and substantially increasing recall in high-risk tasks. Mechanistic analyses show that sMMD selectively preserves clinically decisive variables. In human-AI evaluation, our method outperforms clinicians-in-training and large language models, and improves clinician accuracy by 14.7% while reducing decision time, enabling interpretable, real-time clinical decision support.

preprint2023arXiv

A Framework for Mutual Information-based MIMO Integrated Sensing and Communication Beamforming Design

Integrated sensing and communication (ISAC) unifies sensing and communication, and improves the efficiency of the spectrum, energy, and hardware. In this work, we investigate the ISAC beamforming design to maximize the mutual information between the target response matrix of a point radar target and the echo signals, while ensuring the data rate requirements of the communication users. In order to study the impact of the echo interference caused by communication users on sensing performance, we study two scenarios: a single communication user and multiple communication users. For the case of a single communication user, we consider three types of echo interference, no interference, a point interference, and an extended interference. For the case of multiple communication users, the interference is also an extended one, and furthermore, each user's communication rate requirement needs to be satisfied. To find the optimal beamforming design in these problems, we provide a closed-form solution with low complexiy, a semidefinite relaxation (SDR) method, a low-complexity algorithm based on the Majorization-Minimization (MM) method and the successive convex approximation (SCA) method, and an algorithm based on MM method and SCA method, respectively. Numerical results demonstrate that, compared to the ISAC beamforming schemes based on the Cramér-Rao bound (CRB) metric and the beampattern metric, the proposed mutual information metric can bring better beampattern and root mean square error (RMSE) of angle estimation. Furthermore, our proposed schemes designed based on the mutual information metric can suppress the echo interference from the communication users effectively.

preprint2023arXiv

Compositional Visual Generation with Composable Diffusion Models

Large text-guided diffusion models, such as DALLE-2, are able to generate stunning photorealistic images given natural language descriptions. While such models are highly flexible, they struggle to understand the composition of certain concepts, such as confusing the attributes of different objects or relations between objects. In this paper, we propose an alternative structured approach for compositional generation using diffusion models. An image is generated by composing a set of diffusion models, with each of them modeling a certain component of the image. To do this, we interpret diffusion models as energy-based models in which the data distributions defined by the energy functions may be explicitly combined. The proposed method can generate scenes at test time that are substantially more complex than those seen in training, composing sentence descriptions, object relations, human facial attributes, and even generalizing to new combinations that are rarely seen in the real world. We further illustrate how our approach may be used to compose pre-trained text-guided diffusion models and generate photorealistic images containing all the details described in the input descriptions, including the binding of certain object attributes that have been shown difficult for DALLE-2. These results point to the effectiveness of the proposed method in promoting structured generalization for visual generation. Project page: https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/

preprint2022arXiv

AutoScore-Ordinal: An interpretable machine learning framework for generating scoring models for ordinal outcomes

Background: Risk prediction models are useful tools in clinical decision-making which help with risk stratification and resource allocations and may lead to a better health care for patients. AutoScore is a machine learning-based automatic clinical score generator for binary outcomes. This study aims to expand the AutoScore framework to provide a tool for interpretable risk prediction for ordinal outcomes. Methods: The AutoScore-Ordinal framework is generated using the same 6 modules of the original AutoScore algorithm including variable ranking, variable transformation, score derivation (from proportional odds models), model selection, score fine-tuning, and model evaluation. To illustrate the AutoScore-Ordinal performance, the method was conducted on electronic health records data from the emergency department at Singapore General Hospital over 2008 to 2017. The model was trained on 70% of the data, validated on 10% and tested on the remaining 20%. Results: This study included 445,989 inpatient cases, where the distribution of the ordinal outcome was 80.7% alive without 30-day readmission, 12.5% alive with 30-day readmission, and 6.8% died inpatient or by day 30 post discharge. Two point-based risk prediction models were developed using two sets of 8 predictor variables identified by the flexible variable selection procedure. The two models indicated reasonably good performance measured by mean area under the receiver operating characteristic curve (0.785 and 0.793) and generalized c-index (0.737 and 0.760), which were comparable to alternative models. Conclusion: AutoScore-Ordinal provides an automated and easy-to-use framework for development and validation of risk prediction models for ordinal outcomes, which can systematically identify potential predictors from high-dimensional data.

preprint2022arXiv

Balanced background and explanation data are needed in explaining deep learning models with SHAP: An empirical study on clinical decision making

Objective: Shapley additive explanations (SHAP) is a popular post-hoc technique for explaining black box models. While the impact of data imbalance on predictive models has been extensively studied, it remains largely unknown with respect to SHAP-based model explanations. This study sought to investigate the effects of data imbalance on SHAP explanations for deep learning models, and to propose a strategy to mitigate these effects. Materials and Methods: We propose to adjust class distributions in the background and explanation data in SHAP when explaining black box models. Our data balancing strategy is to compose background data and explanation data with an equal distribution of classes. To evaluate the effects of data adjustment on model explanation, we propose to use the beeswarm plot as a qualitative tool to identify "abnormal" explanation artifacts, and quantitatively test the consistency between variable importance and prediction power. We demonstrated our proposed approach in an empirical study that predicted inpatient mortality using the Medical Information Mart for Intensive Care (MIMIC-III) data and a multilayer perceptron. Results: Using the data balancing strategy would allow us to reduce the number of the artifacts in the beeswarm plot, thus mitigating the negative effects of data imbalance. Additionally, with the balancing strategy, the top-ranked variables from the corresponding importance ranking demonstrated improved discrimination power. Discussion and Conclusion: Our findings suggest that balanced background and explanation data could help reduce the noise in explanation results induced by skewed data distribution and improve the reliability of variable importance ranking. Furthermore, these balancing procedures improve the potential of SHAP in identifying patients with abnormal characteristics in clinical applications.

preprint2022arXiv

Slow Neutron-Capture Process: Low-mass AGB stars and presolar silicon carbide grains

Presolar grains are microscopic dust grains that formed in the stellar winds or explosions of ancient stars that died before the formation of the solar system. The majority (~90% in number) of presolar silicon carbide (SiC) grains, including types mainstream (MS), Y, and Z, came from low-mass C-rich asymptotic giant branch (AGB) stars, which is supported by the ubiquitous presence of SiC dust observed in the circumstellar envelope of AGB stars and the signatures of slow neutron-capture process preserved in these grains. Here, we review the status of isotope studies of presolar AGB SiC grains with an emphasis on heavy-element isotopes and highlight the importance of presolar grain studies for nuclear astrophysics. We discuss the sensitives of different types of nuclei to varying AGB stellar parameters and how their abundances in presolar AGB SiC grains can be used to provide independent, detailed constraints on stellar parameters, including 13C formation, stellar temperature, and nuclear reaction rates.

preprint2022arXiv

Towards Practical Differential Privacy in Data Analysis: Understanding the Effect of Epsilon on Utility in Private ERM

In this paper, we focus our attention on private Empirical Risk Minimization (ERM), which is one of the most commonly used data analysis method. We take the first step towards solving the above problem by theoretically exploring the effect of epsilon (the parameter of differential privacy that determines the strength of privacy guarantee) on utility of the learning model. We trace the change of utility with modification of epsilon and reveal an established relationship between epsilon and utility. We then formalize this relationship and propose a practical approach for estimating the utility under an arbitrary value of epsilon. Both theoretical analysis and experimental results demonstrate high estimation accuracy and broad applicability of our approach in practical applications. As providing algorithms with strong utility guarantees that also give privacy when possible becomes more and more accepted, our approach would have high practical value and may be likely to be adopted by companies and organizations that would like to preserve privacy but are unwilling to compromise on utility.

preprint2021arXiv

Oxygen and Aluminum-Magnesium Isotopic Systematics of Presolar Nanospinel Grains from CI Chondrite Orgueil

Presolar oxide grains have been previously divided into several groups (Group 1 to 4) based on their isotopic compositions, which can be tied to several stellar sources. Much of available data was acquired on large grains, which may not be fully representative of the presolar grain population present in meteorites. We present here new O isotopic data for 74 small presolar oxide grains (~200 nm in diameter on average) from Orgueil and Al-Mg isotopic systematics for 25 of the grains. Based on data-model comparisons, we show that (i) Group 1 and Group 2 grains more likely originated in low-mass first-ascent (red giant branch; RGB) and/or second-ascent (asymptotic giant branch; AGB) red giant stars and (ii) Group 1 grains with (26Al/27Al)0 >= 5x10^-3 and Group 2 grains with (26Al/27Al)0 <= 1x10^-2 all likely experienced extra circulation processes in their parent low-mass stars but under different conditions, resulting in proton-capture reactions occurring at enhanced temperatures. We do not find any large 25Mg excess in Group 1 oxide grains with large 17O enrichments, which provides evidence that 25Mg is not abundantly produced in low-mass stars. We also find that our samples contain a larger proportion of Group 4 grains than so far suggested in the literature for larger presolar oxide grains (~400 nm). We also discuss our observations in the light of stellar dust production mechanisms.

preprint2020arXiv

Cluster analysis of presolar silicon carbide grains: evaluation of their classification and astrophysical implications

Cluster analysis of presolar silicon carbide grains based on literature data for 12C/13C, 14N/15N, δ30Si/28Si, and δ29Si/28Si including or not inferred initial 26Al/27Al data, reveals nine clusters agreeing with previously defined grain types but also highlighting new divisions. Mainstream grains reside in three clusters probably representing different parent star metallicities. One of these clusters has a compact core, with a narrow range of composition, pointing to an enhanced production of SiC grains in asymptotic giant branch (AGB) stars with a narrow range of masses and metallicities. The addition of 26Al/27Al data highlights a cluster of mainstream grains, enriched in 15N and 26Al, which cannot be explained by current AGB models. We defined two AB grain clusters, one with 15N and 26Al excesses, and the other with 14N and smaller 26Al excesses, in agreement with recent studies. Their definition does not use the solar N isotopic ratio as a divider, and the contour of the 26Al-rich AB cluster identified in this study is in better agreement with core-collapse supernova models. We also found a cluster with a mixture of putative nova and AB grains, which may have formed in supernova or nova environments. X grains make up two clusters, having either strongly correlated Si isotopic ratios or deviating from the 2/3 slope line in the Si 3-isotope plot. Finally, most Y and Z grains are jointly clustered, suggesting that the previous use of 12C/13C= 100 as a divider for Y grains was arbitrary. Our results show that cluster analysis is a powerful tool to interpret the data in light of stellar evolution and nucleosynthesis modelling and highlight the need of more multi-element isotopic data for better classification.

preprint2020arXiv

Magnetic-buoyancy Induced Mixing in AGB Stars: Presolar SiC Grains

Isotope ratios can be measured in presolar SiC grains from ancient Asymptotic Giant Branch (AGB) stars at permil-level (0.1\%) precision. Such precise grain data permit derivation of more stringent constraints and calibrations on mixing efficiency in AGB models than traditional spectroscopic observations. In this paper we compare SiC heavy-element isotope ratios to a new series of FRUITY models that include the effects of mixing triggered by magnetic fields. Based on 2D and 3D simulations available in the literature, we propose a new formulation, upon which the general features of mixing induced by magnetic fields can be derived. The efficiency of such a mixing, on the other hand, relies on physical quantities whose values are poorly constrained. We present here our calibration by comparing our model results with the heavy-element isotope data of presolar SiC grains from AGB stars. We demonstrate that the isotopic compositions of all measured elements (Ni, Sr, Zr, Mo, Ba) can be simultaneously fitted by adopting a single magnetic field configuration in our new FRUITY models.

preprint2020arXiv

The Capacity of Private Information Retrieval Under Arbitrary Collusion Patterns

We study the private information retrieval (PIR) problem under arbitrary collusion pattern for replicated databases. We find its capacity, which is the same as the capacity of the original PIR problem with the number of databases $N$ replaced by a number $S^*$. The number $S^*$ is the optimal solution to a linear programming problem that is a function of the collusion pattern. Hence, the collusion pattern affects the capacity of the PIR problem only through the number $S^*$.