Researcher profile

Han Bao

Han Bao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2026arXiv

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-thought prompting, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent upon extensive human-annotated demonstrations, and models' capabilities are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically harnessed to guide and enhance the reasoning capabilities of smaller models.

preprint2026arXiv

SkillGen: Verified Inference-Time Agent Skill Synthesis

Skills are a promising way to improve LLM agent capabilities without retraining, while keeping the added procedure reusable and controllable. However, high-quality skills are still largely written by hand. We introduce SkillGen, a multi-agent framework that synthesizes a single auditable skill from trajectories generated by a base agent. The output is a human-readable artifact that can be inspected before use. Rather than merely summarizing trajectories, SkillGen leverages contrastive induction over both successful and failed trajectories to identify reusable success patterns, recurring failure modes, and behaviors that appear in nearby successes but are missing from failures. SkillGen then generates candidate skills and iteratively refines the skill. A key novelty in SkillGen is that we model agent skills as interventions to empirically verify the net effect of skills on the overall performance. Specifically, we compare outcomes on the same instances with and without the skill, so that we account for both repairs (cases where the skill fixes a baseline failure) and regressions (cases where the skill breaks a baseline success). Across a broad range of agents and datasets, SkillGen consistently improves held-out performance, outperforms existing skill-generation baselines, and produces skills that transfer across models.

preprint2022arXiv

A Semi-Supervised Framework for Automatic Pixel-Wise Breast Cancer Grading of Histological Images

Throughout the world, breast cancer is one of the leading causes of female death. Recently, deep learning methods are developed to automatically grade breast cancer of histological slides. However, the performance of existing deep learning models is limited due to the lack of large annotated biomedical datasets. One promising way to relieve the annotating burden is to leverage the unannotated datasets to enhance the trained model. In this paper, we first apply active learning method in breast cancer grading, and propose a semi-supervised framework based on expectation maximization (EM) model. The proposed EM approach is based on the collaborative filtering among the annotated and unannotated datasets. The collaborative filtering method effectively extracts useful and credible datasets from the unannotated images. Results of pixel-wise prediction of whole-slide images (WSI) demonstrate that the proposed method not only outperforms state-of-art methods, but also significantly reduces the annotation cost by over 70%.

preprint2022arXiv

An Application of a Modified Beta Factor Method for the Analysis of Software Common Cause Failures

This paper presents an approach for modeling software common cause failures (CCFs) within digital instrumentation and control (I&C) systems. CCFs consist of a concurrent failure between two or more components due to a shared failure cause and coupling mechanism. This work emphasizes the importance of identifying software-centric attributes related to the coupling mechanisms necessary for simultaneous failures of redundant software components. The groups of components that share coupling mechanisms are called common cause component groups (CCCGs). Most CCF models rely on operational data as the basis for establishing CCCG parameters and predicting CCFs. This work is motivated by two primary concerns: (1) a lack of operational and CCF data for estimating software CCF model parameters; and (2) the need to model single components as part of multiple CCCGs simultaneously. A hybrid approach was developed to account for these concerns by leveraging existing techniques: a modified beta factor model allows single components to be placed within multiple CCCGs, while a second technique provides software-specific model parameters for each CCCG. This hybrid approach provides a means to overcome the limitations of conventional methods while offering support for design decisions under the limited data scenario.

preprint2022arXiv

Application of Orthogonal Defect Classification for Software Reliability Analysis

The modernization of existing and new nuclear power plants with digital instrumentation and control systems (DI&C) is a recent and highly trending topic. However, there lacks strong consensus on best-estimate reliability methodologies by both the United States (U.S.) Nuclear Regulatory Commission (NRC) and the industry. In this work, we develop an approach called Orthogonal-defect Classification for Assessing Software Reliability (ORCAS) to quantify probabilities of various software failure modes in a DI&C system. The method utilizes accepted industry methodologies for quality assurance that are verified by experimental evidence. In essence, the approach combines a semantic failure classification model with a reliability growth model to predict the probability of failure modes of a software system. A case study was conducted on a representative I&C platform (ChibiOS) running a smart sensor acquisition software developed by Virginia Commonwealth University (VCU). The testing and evidence collection guidance in ORCAS was applied, and defects were uncovered in the software. Qualitative evidence, such as modified condition decision coverage, was used to gauge the completeness and trustworthiness of the assessment while quantitative evidence was used to determine the software failure probabilities. The reliability of the software was then estimated and compared to existing operational data of the sensor device. It is demonstrated that by using ORCAS, a semantic reasoning framework can be developed to justify if the software is reliable (or unreliable) while still leveraging the strength of the existing methods.

preprint2022arXiv

Approximating 1-Wasserstein Distance with Trees

Wasserstein distance, which measures the discrepancy between distributions, shows efficacy in various types of natural language processing (NLP) and computer vision (CV) applications. One of the challenges in estimating Wasserstein distance is that it is computationally expensive and does not scale well for many distribution comparison tasks. In this paper, we aim to approximate the 1-Wasserstein distance by the tree-Wasserstein distance (TWD), where TWD is a 1-Wasserstein distance with tree-based embedding and can be computed in linear time with respect to the number of nodes on a tree. More specifically, we propose a simple yet efficient L1-regularized approach to learning the weights of the edges in a tree. To this end, we first show that the 1-Wasserstein approximation problem can be formulated as a distance approximation problem using the shortest path distance on a tree. We then show that the shortest path distance can be represented by a linear model and can be formulated as a Lasso-based regression problem. Owing to the convex formulation, we can obtain a globally optimal solution efficiently. Moreover, we propose a tree-sliced variant of these methods. Through experiments, we demonstrated that the weighted TWD can accurately approximate the original 1-Wasserstein distance.

preprint2022arXiv

Failure Mechanism Traceability and Application in Human System Interface of Nuclear Power Plants using RESHA

In recent years, there has been considerable effort to modernize existing and new nuclear power plants with digital instrumentation and control systems. However, there has also been considerable concern both by industry and regulatory bodies for the risk and consequence analysis of these systems. Of concern are digital common cause failures specifically due to software defects. These failures by the software can occur in both the control and monitoring of a system. While many methods have been proposed to identify software failure modes, such as Systems Theoretic Process Analysis, Hazard and Consequence Analysis for Digital Systems, etc., these methods are focused primarily on the control action pathway of a system. In contrast, the information feedback pathway lacks Unsafe Control Actions, which are typically related to software basic events; thus, assessment of software basic events in such systems is unclear. In this work, we present the idea of intermediate processors and Unsafe Information Flow (UIF) to help safety analysts trace failure mechanisms in the feedback pathway and how they can be integrated into a fault tree for improved assessment capability. The concepts presented are demonstrated in two comprehensive case studies, a smart sensor integrated platform for unmanned autonomous vehicles and another on a representative advanced human system interface for safety critical plant monitoring. The qualitative software basic events are identified, and a fault tree analysis is conducted based on a modified Redundancy guided Systems theoretic Hazard Analysis methodology. The case studies demonstrate the use of UIFs and intermediate processors in the fault tree to improve traceability of software failures in highly complex digital instrumentation feedback. The improved method clarifies fault tree construction when multiple component dependencies are present in the system.

preprint2022arXiv

On the Surrogate Gap between Contrastive and Supervised Losses

Contrastive representation learning encourages data representation to make semantically similar pairs closer than randomly drawn negative samples, which has been successful in various domains such as vision, language, and graphs. Recent theoretical studies have attempted to explain the benefit of the large negative sample size by upper-bounding the downstream classification loss with the contrastive loss. However, the previous surrogate bounds have two drawbacks: they are only legitimate for a limited range of negative sample sizes and prohibitively large even within that range. Due to these drawbacks, there still does not exist a consensus on how negative sample size theoretically correlates with downstream classification performance. Following the simplified setting where positive pairs are drawn from the true distribution (not generated by data augmentation; as supposed in previous studies), this study establishes surrogate upper and lower bounds for the downstream classification loss for all negative sample sizes that best explain the empirical observations on the negative sample size in the earlier studies. Our bounds suggest that the contrastive loss can be viewed as a surrogate objective of the downstream loss and larger negative sample sizes improve downstream classification because the surrogate gap between contrastive and supervised losses decays. We verify that our theory is consistent with experiments on synthetic, vision, and language datasets.

preprint2022arXiv

Pairwise Supervision Can Provably Elicit a Decision Boundary

Similarity learning is a general problem to elicit useful representations by predicting the relationship between a pair of patterns. This problem is related to various important preprocessing tasks such as metric learning, kernel learning, and contrastive learning. A classifier built upon the representations is expected to perform well in downstream classification; however, little theory has been given in literature so far and thereby the relationship between similarity and classification has remained elusive. Therefore, we tackle a fundamental question: can similarity information provably leads a model to perform well in downstream classification? In this paper, we reveal that a product-type formulation of similarity learning is strongly related to an objective of binary classification. We further show that these two different problems are explicitly connected by an excess risk bound. Consequently, our results elucidate that similarity learning is capable of solving binary classification by directly eliciting a decision boundary.

preprint2022arXiv

Quantitative Evaluation of Common Cause Failures in High Safety-significant Safety-related Digital Instrumentation and Control Systems in Nuclear Power Plants

Digital instrumentation and control (DIC) systems at nuclear power plants (NPPs) have many advantages over analog systems. They are proven to be more reliable, cheaper, and easier to maintain given obsolescence of analog components. However, they also pose new engineering and technical challenges, such as possibility of common cause failures (CCFs) unique to digital systems. This paper proposes a Platform for Risk Assessment of DIC (PRADIC) that is developed by Idaho National Laboratory (INL). A methodology for evaluation of software CCFs in high safety-significant safety-related DIC systems of NPPs was developed as part of the framework. The framework integrates three stages of a typical risk assessment, qualitative hazard analysis and quantitative reliability and consequence analyses. The quantified risks compared with respective acceptance criteria provide valuable insights for system architecture alternatives allowing design optimization in terms of risk reduction and cost savings. A comprehensive case study performed to demonstrate the framework capabilities is documented in this paper. Results show that the PRADIC is a powerful tool capable to identify potential digital-based CCFs, estimate their probabilities, and evaluate their impacts on system and plant safety.

preprint2022arXiv

Systems-theoretic Hazard Analysis of Digital Human-System Interface Relevant to Reactor Trip

Human-system interface is one of the key advanced design features applied to modern digital instrumentation and control systems of nuclear power plants. The conventional design is based on a compact workstation-based system within the control room. The compact workstation provides both a strategic operating environment while also a convenient display for plant status information necessary to the operator. The control environment is further enhanced through display panels, visual and auditory alarms, and procedure systems. However, just like the legacy control, the HSI should incorporate diversity to demonstrate sufficient defense-in-depth protection against common cause failures of the safety system. Furthermore, the vulnerability of the HSI is affected by a plethora of factors, such as human error, cyberattacks, software common cause failures, etc., that complicate the design and analysis. Therefore, this work aims to identify and evaluate existing system vulnerabilities to support the licensing, deployment and operation of HSI designs, especially the functions that are relevant to a reactor trip. We performed a systematic hazard analysis to investigate potential vulnerabilities within the HSI design using the novel redundancy-guided systems-theoretic hazard analysis. This method was developed and demonstrated by Idaho National Laboratory under a project initiated by the Risk-Informed Systems Analysis Pathway of the U.S. Department of Energy's Light Water Reactor Sustainability Program. The goal of the project is to develop a strong technical basis for risk assessment strategies to support effective, reliable, and licensable digital instrumentation and control technologies.

preprint2021arXiv

Uncertainty Quantification and Software Risk Analysis for Digital Twins in the Nearly Autonomous Management and Control Systems: A Review

A nearly autonomous management and control (NAMAC) system is designed to furnish recommendations to operators for achieving particular goals based on NAMAC's knowledge base. As a critical component in a NAMAC system, digital twins (DTs) are used to extract information from the knowledge base to support decision-making in reactor control and management during all modes of plant operations. With the advancement of artificial intelligence and data-driven methods, machine learning algorithms are used to build DTs of various functions in the NAMAC system. To evaluate the uncertainty of DTs and its impacts on the reactor digital instrumentation and control systems, uncertainty quantification (UQ) and software risk analysis is needed. As a comprehensive overview of prior research and a starting point for new investigations, this study selects and reviews relevant UQ techniques and software hazard and software risk analysis methods that may be suitable for DTs in the NAMAC system.

preprint2020arXiv

A Redundancy-Guided Approach for the Hazard Analysis of Digital Instrumentation and Control Systems in Advanced Nuclear Power Plants

Digital instrumentation and control (I&C) upgrades are a vital research area for nuclear industry. Despite their performance benefits, deployment of digital I&C in nuclear power plants (NPPs) has been limited. Digital I&C systems exhibit complex failure modes including common cause failures (CCFs) which can be difficult to identify. This paper describes the development of a redundancy-guided application of the Systems-Theoretic Process Analysis (STPA) and Fault Tree Analysis (FTA) for the hazard analysis of digital I&C in advanced NPPs. The resulting Redundancy-guided System-theoretic Hazard Analysis (RESHA) is applied for the case study of a representative state-of-the-art digital reactor trip system. The analysis qualitatively and systematically identifies the most critical CCFs and other hazards of digital I&C systems. Ultimately, RESHA can help researchers make informed decisions for how, and to what degree, defensive measures such as redundancy, diversity, and defense-in-depth can be used to mitigate or eliminate the potential hazards of digital I&C systems.

preprint2020arXiv

Calibrated Surrogate Maximization of Linear-fractional Utility in Binary Classification

Complex classification performance metrics such as the F${}_β$-measure and Jaccard index are often used, in order to handle class-imbalanced cases such as information retrieval and image segmentation. These performance metrics are not decomposable, that is, they cannot be expressed in a per-example manner, which hinders a straightforward application of M-estimation widely used in supervised learning. In this paper, we consider linear-fractional metrics, which are a family of classification performance metrics that encompasses many standard ones such as the F${}_β$-measure and Jaccard index, and propose methods to directly maximize performances under those metrics. A clue to tackle their direct optimization is a calibrated surrogate utility, which is a tractable lower bound of the true utility function representing a given metric. We characterize sufficient conditions which make the surrogate maximization coincide with the maximization of the true utility. Simulation results on benchmark datasets validate the effectiveness of our calibrated surrogate maximization especially if the sample sizes are extremely small.

preprint2020arXiv

Deep Learning Interfacial Momentum Closures in Coarse-Mesh CFD Two-Phase Flow Simulation Using Validation Data

Multiphase flow phenomena have been widely observed in the industrial applications, yet it remains a challenging unsolved problem. Three-dimensional computational fluid dynamics (CFD) approaches resolve of the flow fields on finer spatial and temporal scales, which can complement dedicated experimental study. However, closures must be introduced to reflect the underlying physics in multiphase flow. Among them, the interfacial forces, including drag, lift, turbulent-dispersion and wall-lubrication forces, play an important role in bubble distribution and migration in liquid-vapor two-phase flows. Development of those closures traditionally rely on the experimental data and analytical derivation with simplified assumptions that usually cannot deliver a universal solution across a wide range of flow conditions. In this paper, a data-driven approach, named as feature-similarity measurement (FSM), is developed and applied to improve the simulation capability of two-phase flow with coarse-mesh CFD approach. Interfacial momentum transfer in adiabatic bubbly flow serves as the focus of the present study. Both a mature and a simplified set of interfacial closures are taken as the low-fidelity data. Validation data (including relevant experimental data and validated fine-mesh CFD simulations results) are adopted as high-fidelity data. Qualitative and quantitative analysis are performed in this paper. These reveal that FSM can substantially improve the prediction of the coarse-mesh CFD model, regardless of the choice of interfacial closures, and it provides scalability and consistency across discontinuous flow regimes. It demonstrates that data-driven methods can aid the multiphase flow modeling by exploring the connections between local physical features and simulation errors.

preprint2020arXiv

Learning from Noisy Similar and Dissimilar Data

With the widespread use of machine learning for classification, it becomes increasingly important to be able to use weaker kinds of supervision for tasks in which it is hard to obtain standard labeled data. One such kind of supervision is provided pairwise---in the form of Similar (S) pairs (if two examples belong to the same class) and Dissimilar (D) pairs (if two examples belong to different classes). This kind of supervision is realistic in privacy-sensitive domains. Although this problem has been looked at recently, it is unclear how to learn from such supervision under label noise, which is very common when the supervision is crowd-sourced. In this paper, we close this gap and demonstrate how to learn a classifier from noisy S and D labeled data. We perform a detailed investigation of this problem under two realistic noise models and propose two algorithms to learn from noisy S-D data. We also show important connections between learning from such pairwise supervision data and learning from ordinary class-labeled data. Finally, we perform experiments on synthetic and real world datasets and show our noise-informed algorithms outperform noise-blind baselines in learning from noisy pairwise data.

preprint2020arXiv

Using Deep Learning to Explore Local Physical Similarity for Global-scale Bridging in Thermal-hydraulic Simulation

Current system thermal-hydraulic codes have limited credibility in simulating real plant conditions, especially when the geometry and boundary conditions are extrapolated beyond the range of test facilities. This paper proposes a data-driven approach, Feature Similarity Measurement FFSM), to establish a technical basis to overcome these difficulties by exploring local patterns using machine learning. The underlying local patterns in multiscale data are represented by a set of physical features that embody the information from a physical system of interest, empirical correlations, and the effect of mesh size. After performing a limited number of high-fidelity numerical simulations and a sufficient amount of fast-running coarse-mesh simulations, an error database is built, and deep learning is applied to construct and explore the relationship between the local physical features and simulation errors. Case studies based on mixed convection have been designed for demonstrating the capability of data-driven models in bridging global scale gaps.

preprint2019arXiv

Measurements with prediction and retrodiction on the collective spin of 10^{11} atoms beat the standard quantum limit

Quantum probes using $N$ uncorrelated particles give a limit on the measurement sensitivity referred to as the standard quantum limit (SQL). The SQL, however, can be overcome by exploiting quantum entangled states, such as spin squeezed states. We report generation of a quantum state, that surpasses the SQL for probing of the collective spin of $10^{11}$ $\text{Rb}$ atoms contained in a vapor cell. The state is prepared and verified by sequences of stroboscopic quantum non-demolition (QND) measurements, and we apply the theory of past quantum states to obtain the spin state information from the outcomes of both earlier and later QND measurements. In this way, we obtain a conditional noise reduction of 5.6 dB, and a metrologically-relevant squeezing of $4.5\pm0.40~\text{dB}$. The past quantum state yields tighter information on the spin component than we can obtain by a conventional QND measurement. Our squeezing results are obtained with 1000 times more atoms than in any previous experiments with a corresponding record $4.6\times10^{-13} rad^2$ variance of the angular fluctuations of a squeezed collective spin.