Source author record

Chen Zhao

Chen Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

41works

35topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions

Large language models (LLMs) are now largely involved in software development workflows, and the code they generate routinely includes third-party library (TPL) imports annotated with specific version identifiers. These version choices can carry security and compatibility risks, yet they have not been systematically studied. We present the first large-scale measurement study of version-level risk in LLM-generated Python code, evaluating 10 LLMs on PinTrace, a curated benchmark of 1,000 Stack Overflow programming tasks. LLMs tend to specify version identifiers when directly prompted at 26.83%-95.18%, while down to 6.45%-59.19% in creating a manifest file directly. Among the specified versions, 36.70%-55.70% of tasks contain at least one known CVE, and 62.75%-74.51% of them carry Critical or High severity ratings. In 72.27%-91.37% of cases, the associated CVEs were publicly disclosed before the model's knowledge cutoff. The statistics show all models converge on the same small set of risky release versions, indicating a systemic bias rather than isolated model error. Static compatibility rates range from 19.70% to 63.20%, with installation failure as the dominant cause. The dynamic test cases confirm the pattern by 6.49%-48.62% pass rates. Further experiments confirm that these failures are attributable to version selection rather than code quality, and that externally anchored version constraints substantially reduce both vulnerability exposure and compatibility failures. Our findings reveal LLM version selection as a first-class, previously overlooked risk surface in LLM-based development. We disclosed these findings to the community of the evaluated models, and several confirmed the issue. All the code and dataset have been released for open science at https://github.com/dw763j/PinTrace.

preprint2026arXiv

Harnessing LLM Agents with Skill Programs

Equipping LLM agents with reusable skills derived from past experience has become a popular and successful approach for tackling complex and long-horizon tasks. However, such lessons are often encoded as textual guidance that remains largely advisory, lacking explicit mechanisms for when and how to intervene in the agent loop. To bridge the gap, we introduce HASP(Harnessing LLM Agents with Skill Programs), a new framework that upgrades skills into executable Program Functions (PFs). Rather than offering passive advice, PFs act as executable guardrails that activate on failure-prone states and modify the next action or inject corrective context. HASP is highly modular: it can be applied at inference time for direct agent-loop intervention, during post-training to provide structured supervision, or for self-improvement by evolving validated, teacher-reviewed PFs. Empirically, HASP drives substantial gains compared to both training-free and training-based methods on web-search, math reasoning, and coding tasks. For example, on web-search reasoning, inference-time PFs alone improve the average performance by 25% compared to (multi-loop) ReAct Agent, while post-training and controlled evolution achieve a 30.4% gain over Search-R1. To provide deeper insights into HASP, our mechanism analysis reveals how PFs trigger and intervene, how skills are internalized, and the requirement for stable skill library evolution.

preprint2026arXiv

Histopathology-centered Computational Evolution of Spatial Omics: Integration, Mapping, and Foundation Models

Spatial omics (SO) technologies enable spatially resolved molecular profiling, while hematoxylin and eosin (H&E) imaging remains the gold standard for morphological assessment in clinical pathology. Recent computational advances increasingly place H&E images at the center of SO analysis, bridging morphology with transcriptomic, proteomic, and other spatial molecular modalities, and pushing resolution toward the single-cell level. In this survey, we systematically review the computational evolution of SO from a histopathology-centered perspective and organize existing methods into three paradigms: integration, which jointly models paired multimodal data; mapping, which infers molecular profiles from H&E images; and foundation models, which learn generalizable representations from large-scale spatial datasets. We analyze how the role of H&E images evolves across these paradigms from spatial context to predictive anchor and ultimately to representation backbone in response to practical constraints such as limited paired data and increasing resolution demands. We further summarize actionable modeling directions enabled by current architectures and delineate persistent gaps driven by data, biology, and technology that are unlikely to be resolved by model design alone. Together, this survey provides a histopathology-centered roadmap for developing and applying computational frameworks in SO.

preprint2026arXiv

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

Reasoning-intensive retrieval aims to surface evidence that supports downstream reasoning rather than merely matching topical similarity. This capability is increasingly important for agentic search systems, where retrievers must provide complementary evidence across iterative search and synthesis. However, existing work remains limited on both evaluation and training: benchmarks such as BRIGHT provide narrow gold sets and evaluate retrievers in isolation, while synthetic training corpora often optimize single-passage relevance rather than evidence portfolio construction. We introduce BRIGHT-Pro, an expert-annotated benchmark that expands each query with multi-aspect gold evidence and evaluates retrievers under both static and agentic search protocols. We further construct RTriever-Synth, an aspect-decomposed synthetic corpus that generates complementary positives and positive-conditioned hard negatives, and use it to LoRA fine-tune RTriever-4B from Qwen3-Embedding-4B. Experiments across lexical, general-purpose, and reasoning-intensive retrievers show that aspect-aware and agentic evaluation expose behaviors hidden by standard metrics, while RTriever-4B substantially improves over its base model.

preprint2026arXiv

The Unlearnability Phenomenon in RLVR for Language Models

Reinforcement Learning with Verifiable Reward (RLVR) has proven effective in improving Large Language Model's (LLM) reasoning ability. However, the learning dynamics of RLVR remain underexplored. In this paper, we reveal a counterintuitive phenomenon: among hard examples that the model initially struggles with, a substantial subset remains unlearnable even when correct rollouts are present. To understand the phenomenon, we first demonstrate that existing optimization and sampling techniques fail to resolve unlearnability. With cross-example gradient analysis, we show that unlearnable examples have fundamental representation issue, characterized by low gradient similarity with the rest of the examples and ungeneralizable reasoning patterns. We further show that representation flaws are difficult to mitigate in RL, as data augmentation does not improve gradient similarity. Our study provides the first systematic characterization of unlearnable data in RLVR training and reveals fundamental limitations in current RL approaches for reasoning tasks. Code and data are available at \url{https://github.com/yulinchen99/unlearnability-rlvr}.

preprint2026arXiv

YOLO-IOD: Towards Real Time Incremental Object Detection

Current methods for incremental object detection (IOD) primarily rely on Faster R-CNN or DETR series detectors; however, these approaches do not accommodate the real-time YOLO detection frameworks. In this paper, we first identify three primary types of knowledge conflicts that contribute to catastrophic forgetting in YOLO-based incremental detectors: foreground-background confusion, parameter interference, and misaligned knowledge distillation. Subsequently, we introduce YOLO-IOD, a real-time Incremental Object Detection (IOD) framework that is constructed upon the pretrained YOLO-World model, facilitating incremental learning via a stage-wise parameter-efficient fine-tuning process. Specifically, YOLO-IOD encompasses three principal components: 1) Conflict-Aware Pseudo-Label Refinement (CPR), which mitigates the foreground-background confusion by leveraging the confidence levels of pseudo labels and identifying potential objects relevant to future tasks. 2) Importancebased Kernel Selection (IKS), which identifies and updates the pivotal convolution kernels pertinent to the current task during the current learning stage. 3) Cross-Stage Asymmetric Knowledge Distillation (CAKD), which addresses the misaligned knowledge distillation conflict by transmitting the features of the student target detector through the detection heads of both the previous and current teacher detectors, thereby facilitating asymmetric distillation between existing and newly introduced categories. We further introduce LoCo COCO, a more realistic benchmark that eliminates data leakage across stages. Experiments on both conventional and LoCo COCO benchmarks show that YOLO-IOD achieves superior performance with minimal forgetting.

preprint2022arXiv

Active information, missing data and prevalence estimation

The topic of this paper is prevalence estimation from the perspective of active information. Prevalence among tested individuals has an upward bias under the assumption that individuals' willingness to be tested for the disease increases with the strength of their symptoms. Active information due to testing bias quantifies the degree at which the willingness to be tested correlates with infection status. Interpreting incomplete testing as a missing data problem, the missingness mechanism impacts the degree at which the bias of the original prevalence estimate can be removed. The reduction in prevalence, when testing bias is adjusted for, translates into an active information due to bias correction, with opposite sign to active information due to testing bias. Prevalence and active information estimates are asymptotically normal, a behavior also illustrated through simulations.

preprint2022arXiv

Adaptive Fairness-Aware Online Meta-Learning for Changing Environments

The fairness-aware online learning framework has arisen as a powerful tool for the continual lifelong learning setting. The goal for the learner is to sequentially learn new tasks where they come one after another over time and the learner ensures the statistic parity of the new coming task across different protected sub-populations (e.g. race and gender). A major drawback of existing methods is that they make heavy use of the i.i.d assumption for data and hence provide static regret analysis for the framework. However, low static regret cannot imply a good performance in changing environments where tasks are sampled from heterogeneous distributions. To address the fairness-aware online learning problem in changing environments, in this paper, we first construct a novel regret metric FairSAR by adding long-term fairness constraints onto a strongly adapted loss regret. Furthermore, to determine a good model parameter at each round, we propose a novel adaptive fairness-aware online meta-learning algorithm, namely FairSAOML, which is able to adapt to changing environments in both bias control and model precision. The problem is formulated in the form of a bi-level convex-concave optimization with respect to the model's primal and dual parameters that are associated with the model's accuracy and fairness, respectively. The theoretic analysis provides sub-linear upper bounds for both loss regret and violation of cumulative fairness constraints. Our experimental evaluation on different real-world datasets with settings of changing environments suggests that the proposed FairSAOML significantly outperforms alternatives based on the best prior online learning approaches.

preprint2022arXiv

Capillary rising in a tube with corners

We study the dynamics of a fluid rising in a capillary tube with corners. In the cornered tube, unlike the circular tube, fluid rises with two parts, the bulk part where the entire cross-section is occupied by the fluid, and the finger part where the cross-section is only partially filled. Using Onsager principle, we derive coupled time-evolution equations for the two parts. We show that (a) at the early stage of rising, the dynamics is dominated by the bulk part and the fluid height $h_0(t)$ shows the same behavior as that in the circular tube, and (b) at the late stage, the bulk part stops rising, but the finger part keeps rising following the scaling law of $h_1(t) \sim t^{1/3}$. We also show that due to the coupling between the two parts, the equilibrium bulk height is smaller than the Jurin's height which ignores the effect of the finger part.

preprint2022arXiv

Ego4D: Around the World in 3,000 Hours of Egocentric Video

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/

preprint2022arXiv

End-to-End Active Speaker Detection

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process: feature extraction and spatio-temporal context aggregation. In this paper, we propose an end-to-end ASD workflow where feature learning and contextual predictions are jointly learned. Our end-to-end trainable network simultaneously learns multi-modal embeddings and aggregates spatio-temporal context. This results in more suitable feature representations and improved performance in the ASD task. We also introduce interleaved graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem. Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance. Finally, we design a weakly-supervised strategy, which demonstrates that the ASD problem can also be approached by utilizing audiovisual data but relying exclusively on audio annotations. We achieve this by modelling the direct relationship between the audio signal and the possible sound sources (speakers), as well as introducing a contrastive loss. All the resources of this project will be made available at: https://github.com/fuankarion/end-to-end-asd.

preprint2022arXiv

Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects

In this paper, we tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images. This task contrasts with the one considered by most existing deep learning methods which typically assume that the testing objects have been observed during training. To handle the unseen objects, we follow a retrieval-based strategy and prevent the network from learning object-specific features by computing multi-scale local similarities between the query image and synthetically-generated reference images. We then introduce an adaptive fusion module that robustly aggregates the local similarities into a global similarity score of pairwise images. Furthermore, we speed up the retrieval process by developing a fast retrieval strategy. Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works. Our code and pre-trained models are available at https://sailor-z.github.io/projects/Unseen_Object_Pose.html.

preprint2022arXiv

Layer Adaptive Deep Neural Networks for Out-of-distribution Detection

During the forward pass of Deep Neural Networks (DNNs), inputs gradually transformed from low-level features to high-level conceptual labels. While features at different layers could summarize the important factors of the inputs at varying levels, modern out-of-distribution (OOD) detection methods mostly focus on utilizing their ending layer features. In this paper, we proposed a novel layer-adaptive OOD detection framework (LA-OOD) for DNNs that can fully utilize the intermediate layers' outputs. Specifically, instead of training a unified OOD detector at a fixed ending layer, we train multiple One-Class SVM OOD detectors simultaneously at the intermediate layers to exploit the full spectrum characteristics encoded at varying depths of DNNs. We develop a simple yet effective layer-adaptive policy to identify the best layer for detecting each potential OOD example. LA-OOD can be applied to any existing DNNs and does not require access to OOD samples during the training. Using three DNNs of varying depth and architectures, our experiments demonstrate that LA-OOD is robust against OODs of varying complexity and can outperform state-of-the-art competitors by a large margin on some real-world datasets.

preprint2022arXiv

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In comparison, limited effort has been made at assessing the fitness of these datasets for the video-language grounding task. Recent works have begun to discover significant limitations in these datasets, suggesting that state-of-the-art techniques commonly overfit to hidden dataset biases. In this work, we present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations and focuses on crawling and aligning available audio descriptions of mainstream movies. MAD contains over 384,000 natural language sentences grounded in over 1,200 hours of videos and exhibits a significant reduction in the currently diagnosed biases for video-language grounding datasets. MAD's collection strategy enables a novel and more challenging version of video-language grounding, where short temporal moments (typically seconds long) must be accurately grounded in diverse long-form videos that can last up to three hours. We have released MAD's data and baselines code at https://github.com/Soldelli/MAD.

preprint2022arXiv

R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning

Class-Incremental Learning (CIL) struggles with catastrophic forgetting when learning new knowledge, and Data-Free CIL (DFCIL) is even more challenging without access to the training data of previously learned classes. Though recent DFCIL works introduce techniques such as model inversion to synthesize data for previous classes, they fail to overcome forgetting due to the severe domain gap between the synthetic and real data. To address this issue, this paper proposes relation-guided representation learning (RRL) for DFCIL, dubbed R-DFCIL. In RRL, we introduce relational knowledge distillation to flexibly transfer the structural relation of new data from the old model to the current model. Our RRL-boosted DFCIL can guide the current model to learn representations of new classes better compatible with representations of previous classes, which greatly reduces forgetting while improving plasticity. To avoid the mutual interference between representation and classifier learning, we employ local rather than global classification loss during RRL. After RRL, the classification head is refined with global class-balanced classification loss to address the data imbalance issue as well as learn the decision boundaries between new and previous classes. Extensive experiments on CIFAR100, Tiny-ImageNet200, and ImageNet100 demonstrate that our R-DFCIL significantly surpasses previous approaches and achieves a new state-of-the-art performance for DFCIL. Code is available at https://github.com/jianzhangcs/R-DFCIL

preprint2022arXiv

Unsupervised Learning of 3D Semantic Keypoints with Mutual Reconstruction

Semantic 3D keypoints are category-level semantic consistent points on 3D objects. Detecting 3D semantic keypoints is a foundation for a number of 3D vision tasks but remains challenging, due to the ambiguity of semantic information, especially when the objects are represented by unordered 3D point clouds. Existing unsupervised methods tend to generate category-level keypoints in implicit manners, making it difficult to extract high-level information, such as semantic labels and topology. From a novel mutual reconstruction perspective, we present an unsupervised method to generate consistent semantic keypoints from point clouds explicitly. To achieve this, the proposed model predicts keypoints that not only reconstruct the object itself but also reconstruct other instances in the same category. To the best of our knowledge, the proposed method is the first to mine 3D semantic consistent keypoints from a mutual reconstruction view. Experiments under various evaluation metrics as well as comparisons with the state-of-the-arts demonstrate the efficacy of our new solution to mining semantic consistent keypoints with mutual reconstruction.

preprint2022arXiv

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search

The key challenge in neural architecture search (NAS) is designing how to explore wisely in the huge search space. We propose a new NAS method called TNAS (NAS with trees), which improves search efficiency by exploring only a small number of architectures while also achieving a higher search accuracy. TNAS introduces an architecture tree and a binary operation tree, to factorize the search space and substantially reduce the exploration size. TNAS performs a modified bi-level Breadth-First Search in the proposed trees to discover a high-performance architecture. Impressively, TNAS finds the global optimal architecture on CIFAR-10 with test accuracy of 94.37\% in four GPU hours in NAS-Bench-201. The average test accuracy is 94.35\%, which outperforms the state-of-the-art. Code is available at: \url{https://github.com/guochengqian/TNAS}.

preprint2021arXiv

$L^2$-representation of Hodge Modules

Over an arbitrary compact complex space or an arbitrary germ of complex space $X$, we provide fine resolutions of pure Hodge modules with strict supports $IC_X(\mathbb{V})$ via differential forms with locally $L^2$ boundary conditions. When $\mathbb{V}=\mathbb{C}_{X_{\rm reg}}$ is the trivial variation of Hodge structure, we give a solution to a Cheeger-Goresky-MacPherson type conjecture: For any compact complex space $X$, there is a complete hermitian metric $ds^2$ on $X_{\rm reg}$ such that there is a canonical isomorphism $$H^i_{(2)}(X_{\rm reg},ds^2)\simeq IH^i(X),\quad \forall i.$$ Such metric $ds^2$ could be Kähler if $X$ is a Kähler space. As an application, we give a differential geometrical proof of the Kähler package of the hypercohomology of pure Hodge modules. We also prove the Kähler version of Kashiwara's conjecture in the absolute case.

preprint2021arXiv

A Deep Learning-Based Approach to Extracting Periosteal and Endosteal Contours of Proximal Femur in Quantitative CT Images

Automatic CT segmentation of proximal femur is crucial for the diagnosis and risk stratification of orthopedic diseases; however, current methods for the femur CT segmentation mainly rely on manual interactive segmentation, which is time-consuming and has limitations in both accuracy and reproducibility. In this study, we proposed an approach based on deep learning for the automatic extraction of the periosteal and endosteal contours of proximal femur in order to differentiate cortical and trabecular bone compartments. A three-dimensional (3D) end-to-end fully convolutional neural network, which can better combine the information between neighbor slices and get more accurate segmentation results, was developed for our segmentation task. 100 subjects aged from 50 to 87 years with 24,399 slices of proximal femur CT images were enrolled in this study. The separation of cortical and trabecular bone derived from the QCT software MIAF-Femur was used as the segmentation reference. We randomly divided the whole dataset into a training set with 85 subjects for 10-fold cross-validation and a test set with 15 subjects for evaluating the performance of models. Two models with the same network structures were trained and they achieved a dice similarity coefficient (DSC) of 97.87% and 96.49% for the periosteal and endosteal contours, respectively. To verify the excellent performance of our model for femoral segmentation, we measured the volume of different parts of the femur and compared it with the ground truth and the relative errors between predicted result and ground truth are all less than 5%. It demonstrated a strong potential for clinical use, including the hip fracture risk prediction and finite element analysis.

preprint2021arXiv

A Deep Learning-based Method to Extract Lumen and Media-Adventitia in Intravascular Ultrasound Images

Intravascular ultrasound (IVUS) imaging allows direct visualization of the coronary vessel wall and is suitable for the assessment of atherosclerosis and the degree of stenosis. Accurate segmentation and measurements of lumen and median-adventitia (MA) from IVUS are essential for such a successful clinical evaluation. However, current segmentation relies on manual operations, which is time-consuming and user-dependent. In this paper, we aim to develop a deep learning-based method using an encoder-decoder deep architecture to automatically extract both lumen and MA border. Our method named IVUS-U-Net++ is an extension of the well-known U-Net++ model. More specifically, a feature pyramid network was added to the U-Net++ model, enabling the utilization of feature maps at different scales. As a result, the accuracy of the probability map and subsequent segmentation have been improved We collected 1746 IVUS images from 18 patients in this study. The whole dataset was split into a training dataset (1572 images) for the 10-fold cross-validation and a test dataset (174 images) for evaluating the performance of models. Our IVUS-U-Net++ segmentation model achieved a Jaccard measure (JM) of 0.9412, a Hausdorff distance (HD) of 0.0639 mm for the lumen border, and a JM of 0.9509, an HD of 0.0867 mm for the MA border, respectively. Moreover, the Pearson correlation and Bland-Altman analyses were performed to evaluate the correlations of 12 clinical parameters measured from our segmentation results and the ground truth, and automatic measurements agreed well with those from the ground truth (all Ps<0.01). In conclusion, our preliminary results demonstrate that the proposed IVUS-U-Net++ model has great promise for clinical use.

preprint2021arXiv

A new approach to extracting coronary arteries and detecting stenosis in invasive coronary angiograms

In stable coronary artery disease (CAD), reduction in mortality and/or myocardial infarction with revascularization over medical therapy has not been reliably achieved. Coronary arteries are usually extracted to perform stenosis detection. We aim to develop an automatic algorithm by deep learning to extract coronary arteries from ICAs.In this study, a multi-input and multi-scale (MIMS) U-Net with a two-stage recurrent training strategy was proposed for the automatic vessel segmentation. Incorporating features such as the Inception residual module with depth-wise separable convolutional layers, the proposed model generated a refined prediction map with the following two training stages: (i) Stage I coarsely segmented the major coronary arteries from pre-processed single-channel ICAs and generated the probability map of vessels; (ii) during the Stage II, a three-channel image consisting of the original preprocessed image, a generated probability map, and an edge-enhanced image generated from the preprocessed image was fed to the proposed MIMS U-Net to produce the final segmentation probability map. During the training stage, the probability maps were iteratively and recurrently updated by feeding into the neural network. After segmentation, an arterial stenosis detection algorithm was developed to extract vascular centerlines and calculate arterial diameters to evaluate stenotic level. Experimental results demonstrated that the proposed method achieved an average Dice score of 0.8329, an average sensitivity of 0.8281, and an average specificity of 0.9979 in our dataset with 294 ICAs obtained from 73 patient. Moreover, our stenosis detection algorithm achieved a true positive rate of 0.6668 and a positive predictive value of 0.7043.

preprint2021arXiv

A Primal-Dual Subgradient Approachfor Fair Meta Learning

The problem of learning to generalize to unseen classes during training, known as few-shot classification, has attracted considerable attention. Initialization based methods, such as the gradient-based model agnostic meta-learning (MAML), tackle the few-shot learning problem by "learning to fine-tune". The goal of these approaches is to learn proper model initialization, so that the classifiers for new classes can be learned from a few labeled examples with a small number of gradient update steps. Few shot meta-learning is well-known with its fast-adapted capability and accuracy generalization onto unseen tasks. Learning fairly with unbiased outcomes is another significant hallmark of human intelligence, which is rarely touched in few-shot meta-learning. In this work, we propose a Primal-Dual Fair Meta-learning framework, namely PDFM, which learns to train fair machine learning models using only a few examples based on data from related tasks. The key idea is to learn a good initialization of a fair model's primal and dual parameters so that it can adapt to a new fair learning task via a few gradient update steps. Instead of manually tuning the dual parameters as hyperparameters via a grid search, PDFM optimizes the initialization of the primal and dual parameters jointly for fair meta-learning via a subgradient primal-dual approach. We further instantiate examples of bias controlling using mean difference and decision boundary covariance as fairness constraints to each task for supervised regression and classification, respectively. We demonstrate the versatility of our proposed approach by applying our approach to various real-world datasets. Our experiments show substantial improvements over the best prior work for this setting.

preprint2021arXiv

Wetting dynamics in an angular channel

We analyze the dynamics of liquid filling in a thin, slightly inflated rectangular channel driven by capillary forces. We show that although the amount of liquid $m$ in the channel increases in time following the classical Lucas-Washburn law, $m \propto t^{1/2}$, the prefactor is very sensitive to the deformation of the channel because the filling takes place by the growth of two parts, the bulk part (where the cross-section is completely filled by the liquid), and the finger part (where the cross-section is partially filled). We calculate the time dependence of $m$ accounting for the coupling between the two parts and show that the prefactor for the filling can be reduced significantly by a slight deformation of the rectangular channel, e.g., the prefactor is reduced 50% for a strain of 0.1%. This offers an explanation for the large deviation in the value of the prefactor reported previously.

preprint2020arXiv

3D Fusion between Fluoroscopy Angiograms and SPECT Myocardial Perfusion Images to Guide Percutaneous Coronary Intervention

Background. Percutaneous coronary intervention(PCI) in stable coronary artery disease(CAD) is commonly triggered by abnormal myocardial perfusion imaging(MPI). However, due to the possibilities of multivessel disease and variability of coronary artery perfusion distribution, opportunity exists to better align anatomic stenosis with perfusion abnormalities to improve revascularization decisions. This study aims to develop a 3D multi-modality fusion approach to assist decision-making for PCI. Methods. Coronary arteries from fluoroscopic angiography(FA) were reconstructed into 3D artery anatomy. Left ventricular(LV) epicardial surface was extracted from SPECT. The 3D artery anatomy was non-rigidly fused with the LV epicardial surface. The accuracy of the 3D fusion was evaluated via both computer simulation and real patient data. For technical validation, simulated FA and MPI were integrated and then compared with the ground truth from a digital phantom. For clinical validation, FA and SPECT images were integrated and then compared with the ground truth from CT angiograms. Results. In the technical evaluation, the distance-based mismatch error between simulated fluoroscopy and phantom arteries is 1.86(SD:1.43)mm for left coronary arteries(LCA) and 2.21(SD:2.50)mm for right coronary arteries(RCA). In the clinical validation, the distance-based mismatch errors between the fluoroscopy and CT arteries were 3.84(SD:3.15)mm for LCA and 5.55(SD:3.64)mm for RCA. The presence of the corresponding fluoroscopy and CT arteries in the AHA 17-segment model agreed well with a Kappa value of 0.91(95% CI: 0.89-0.93) for LCA and 0.80(CI: 0.67-0.92) for RCA. Conclusions. Our fusion approach is technically accurate to assist PCI decision-making and is clinically feasible to be used in the catheterization laboratory. There is an opportunity to improve the decision-making and outcomes of PCI in stable CAD.

preprint2020arXiv

A Deep Learning-Based Method for Automatic Segmentation of Proximal Femur from Quantitative Computed Tomography Images

Purpose: Proximal femur image analyses based on quantitative computed tomography (QCT) provide a method to quantify the bone density and evaluate osteoporosis and risk of fracture. We aim to develop a deep-learning-based method for automatic proximal femur segmentation. Methods and Materials: We developed a 3D image segmentation method based on V-Net, an end-to-end fully convolutional neural network (CNN), to extract the proximal femur QCT images automatically. The proposed V-net methodology adopts a compound loss function, which includes a Dice loss and a L2 regularizer. We performed experiments to evaluate the effectiveness of the proposed segmentation method. In the experiments, a QCT dataset which included 397 QCT subjects was used. For the QCT image of each subject, the ground truth for the proximal femur was delineated by a well-trained scientist. During the experiments for the entire cohort then for male and female subjects separately, 90% of the subjects were used in 10-fold cross-validation for training and internal validation, and to select the optimal parameters of the proposed models; the rest of the subjects were used to evaluate the performance of models. Results: Visual comparison demonstrated high agreement between the model prediction and ground truth contours of the proximal femur portion of the QCT images. In the entire cohort, the proposed model achieved a Dice score of 0.9815, a sensitivity of 0.9852 and a specificity of 0.9992. In addition, an R2 score of 0.9956 (p<0.001) was obtained when comparing the volumes measured by our model prediction with the ground truth. Conclusion: This method shows a great promise for clinical application to QCT and QCT-based finite element analysis of the proximal femur for evaluating osteoporosis and hip fracture risk.

preprint2020arXiv

A Novel Method for ECG Signal Classification via One-Dimensional Convolutional Neural Network

This paper presents an end-to-end ECG signal classification method based on a novel segmentation strategy via 1D Convolutional Neural Networks (CNN) to aid the classification of ECG signals. The ECG segmentation strategy named R-R-R strategy (i.e., retaining ECG data between the R peaks just before and after the current R peak) for segmenting the original ECG data into segments in order to train and test the 1D CNN models. The novel strategy mimics physicians in scanning ECG to a greater extent, and maximizes the inherent information of ECG segments. The performance of the classification models for 5-class and 6-class are verified with ECG signals from 48 records of the MIT-BIH arrhythmia database. As the heartbeat types are divided into 5 classes (i.e., normal beat, left bundle branch block beat, right bundle branch block beat, ventricular ectopic beat, and paced beat) in the MIT-BIH, the best classification accuracy, the area under the curve (AUC), the sensitivity and the F1-score reach 99.24%, 0.9994, 0.99 and 0.99, respectively. As the heartbeat types are divided into 6 classes (i.e., normal beat, left bundle branch block beat, right bundle branch block beat, ventricular ectopic beat, paced beat and other beats) in the MIT-BIH, the beat classification accuracy, the AUC, the sensitivity, and the F1-score reach 97.02%, 0.9966, 0.97, and 0.97, respectively. Meanwhile, according to the recommended practice from the Association for Advancement of Medical Instrumentation (AAMI), the heartbeat types are divided into 5 classes (i.e., normal beat, supraventricular ectopic beats, ventricular ectopic beats, fusion beats, and unclassifiable beats), the beat classification accuracy, the sensitivity, and the F1-score reach 97.45%, 0.97, and 0.97, respectively. The experimental results show that the proposed method achieves better performance than the state-of-the-art methods.

preprint2020arXiv

Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization

Fine-grained visual categorization (FGVC) is an important but challenging task due to high intra-class variances and low inter-class variances caused by deformation, occlusion, illumination, etc. An attention convolutional binary neural tree architecture is presented to address those problems for weakly supervised FGVC. Specifically, we incorporate convolutional operations along edges of the tree structure, and use the routing functions in each node to determine the root-to-leaf computational paths within the tree. The final decision is computed as the summation of the predictions from leaf nodes. The deep convolutional operations learn to capture the representations of objects, and the tree structure characterizes the coarse-to-fine hierarchical feature learning process. In addition, we use the attention transformer module to enforce the network to capture discriminative features. The negative log-likelihood loss is used to train the entire network in an end-to-end fashion by SGD with back-propagation. Several experiments on the CUB-200-2011, Stanford Cars and Aircraft datasets demonstrate that the proposed method performs favorably against the state-of-the-arts.

preprint2020arXiv

G-TAD: Sub-Graph Localization for Temporal Action Detection

Temporal action detection is a fundamental yet challenging task in video understanding. Video context is a critical cue to effectively detect actions, but current works mainly focus on temporal context, while neglecting semantic context as well as other important context properties. In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem. Specifically, we formulate video snippets as graph nodes, snippet-snippet correlations as edges, and actions associated with context as target sub-graphs. With graph convolution as the basic operation, we design a GCN block called GCNeXt, which learns the features of each node by aggregating its context and dynamically updates the edges in the graph. To localize each sub-graph, we also design an SGAlign layer to embed each sub-graph into the Euclidean space. Extensive experiments show that G-TAD is capable of finding effective video context without extra supervision and achieves state-of-the-art performance on two detection benchmarks. On ActivityNet-1.3, it obtains an average mAP of 34.09%; on THUMOS14, it reaches 51.6% at IoU@0.5 when combined with a proposal processing method. G-TAD code is publicly available at https://github.com/frostinassiky/gtad.

preprint2020arXiv

MacPherson's Conjecture via Hörmander Estimate

In this notes we reprove MacPherson's conjecture on $L^2-(n,q)$-cohomology through Demailly's formulation of Hörmander's Estimate. This approach allows us to weaken the condition of locally semipositivity in Ruppenthal's $L^2$-representation of adjoint bundle. Moreover we prove the MacPherson's conjecture of twisted coefficient bundle under an arbitrary singular hermitian metric. As applications, we study the MacPherson type problem for pluri-canonical bundle.

preprint2020arXiv

Potassium Isotope Compositions of Carbonaceous and Ordinary Chondrites: Implications on the Origin of Volatile Depletion in the Early Solar System

Solar system materials are variably depleted in moderately volatile elements (MVEs) relative to the proto-solar composition. To address the origin of this MVE depletion, we conducted a systematic study of high-precision K isotopic composition on 16 carbonaceous chondrites (CCs) of types CM1-2, CO3, CV3, CR2, CK4-5 and CH3 and 28 ordinary chondrites (OCs) covering petrological types 3 to 6 and chemical groups H, L, and LL. We observed significant overall K isotope (delta41K) variations (-1.54 to 0.70 permil). The K isotope compositions of CCs are largely higher than the Bulk Silicate Earth (BSE) value, whereas OCs show typically lower values than BSE. Neither CCs nor OCs show resolvable correlations between K isotopes and chemical groups, petrological types, shock levels, exposure ages, fall or find occurrence, or terrestrial weathering. The lack of a clear trend between K isotopes and K content indicates that the K isotope fractionations were decoupled from the relative elemental K depletions. The range of K isotope variations in the CCs is consistent with a four-component (chondrule, refractory inclusion, matrix and water) mixing model that is able to explain the bulk elemental and isotopic compositions of the main CC groups, but requires a fractionation in K isotopic compositions in chondrules. We propose that the major control of the isotopic compositions of group averages is condensation or vaporization in nebular environments that is preserved in the compositional variation of chondrules. Parent-body processes (aqueous alteration, thermal metamorphism, and metasomatism) can mobilize K and affect the K isotopes in individual samples. In the case of the OCs, the full range of K isotopic variations can only be explained by the combined effects of the size and relative abundances of chondrules, parent-body aqueous and thermal alteration.

preprint2020arXiv

Potassium Isotopic Compositions of Enstatite Meteorites

Enstatite chondrites and aubrites are meteorites that show the closest similarities to the Earth in many isotope systems that undergo mass-independent and mass-dependent isotope fractionations. Due to the analytical challenges to obtain high-precision K isotopic compositions in the past, potential differences in K isotopic compositions between enstatite meteorites and the Earth remained uncertain. We report the first high-precision K isotopic compositions of eight enstatite chondrites and four aubrites and find that there is a significant variation of K isotopic compositions among enstatite meteorites (from -2.34 permil to -0.18 permil). However, K isotopic compositions of nearly all enstatite meteorites scatter around the Bulk Silicate Earth (BSE) value. The average K isotopic composition of the eight enstatite chondrites (-0.47 +/- 0.57 permil) is indistinguishable from the BSE value (-0.48 +/- 0.03 permil), thus further corroborating the isotopic similarity between Earth' building blocks and enstatite meteorite precursors. We found no correlation of K isotopic compositions with the chemical groups, petrological types, shock degrees, and terrestrial weathering conditions; however, the variation of K isotopes among enstatite meteorite can be attributed to the parent body processing. Our sample of the main group aubrite MIL 13004 is exceptional and has an extremely light K isotopic composition (delta 41K= -2.34 +/- 0.12 permil). We attribute this unique K isotopic feature to the presence of abundant djerfisherite inclusions in our sample because this K-bearing sulfide mineral is predicted to be enriched in 39K during equilibrium exchange with silicates.

preprint2020arXiv

Structure and overstability of resistive modes with runaway electrons

We investigate the effects of runaway electron current on the dispersion relation of resistive magnetohydrodynamic modes in tokamaks. We present a new theoretical model to derive the dispersion relation, which is based on the asymptotic analysis of the resistive layer structure of the modes. It is found that in addition to the conventional resistive layer, a new runaway current layer can emerge whose properties depend on the ratio of the Alfvén velocity to the runaway electron convection speed. Due to the contribution from this layer, both the tearing mode and kink mode will have a real frequency in addition to a growth rate. The derived dispersion relation has been compared with numerical results using both a simplified eigenvalue calculation and a M3D-C1 linear simulation, and good agreement is found in both cases.

preprint2020arXiv

Wetting equilibrium in a rectangular channel

When a capillary channel with corners is wetted by a fluid, there are regions where the fluid fills the whole cross-section and regions where only the corners are filled by the fluid. The fluid fraction of the partially-filled region, $s^*$, is an important quantity related to the capillary pressure. We calculate the value of $s^*$ for channels with a cross-section slightly deviated from a rectangle: the height is larger in the center than those on the two short sides. We find that a small change in the cross-section geometry leads to a huge change of $s^*$. This result is consistent with experimental observations.

preprint2019arXiv

Memories in the Photoluminescence Intermittency of Single Cesium Lead Bromide Nanocrystals

Single cesium lead bromide (CsPbBr3) nanocrystals show strong photoluminescence blinking, with on- and off- dwelling times following power-law distributions. We investigate the memory effect in the photoluminescence blinking of single CsPbBr3 nanocrystals and find positive correlations for successive on-times and successive off-times. This memory effect is not sensitive to the nature of the surface capping ligand and the embedding polymer. These observations suggest that photoluminescence intermittency and its memory are mainly controlled by intrinsic traps in the nanocrystals. These findings will help optimizing light-emitting devices based on inorganic perovskite nanocrystals.

preprint2016arXiv

Fundamental building blocks of controlling complex networks: A universal controllability framework

To understand the controllability of complex networks is a forefront problem relevant to different fields of science and engineering. Despite recent advances in network controllability theories, an outstanding issue is to understand the effect of network topology and nodal interactions on the controllability at the most fundamental level. Here we develop a universal framework based on local information only to unearth the most {\em fundamental building blocks} that determine the controllability. In particular, we introduce a network dissection process to fully unveil the origin of the role of individual nodes and links in control, giving rise to a criterion for the much needed strong structural controllability. We theoretically uncover various phase-transition phenomena associated with the role of nodes and links and strong structural controllability. Applying our theory to a large number of empirical networks demonstrates that technological networks are more strongly structurally controllable (SSC) than many social and biological networks, and real world networks are generally much more SSC than their random counterparts with intrinsic resilience and adaptability as a result of human design and natural evolution.

preprint2016arXiv

LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning

LightNet is a lightweight, versatile and purely Matlab-based deep learning framework. The idea underlying its design is to provide an easy-to-understand, easy-to-use and efficient computational platform for deep learning research. The implemented framework supports major deep learning architectures such as Multilayer Perceptron Networks (MLP), Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). The framework also supports both CPU and GPU computation, and the switch between them is straightforward. Different applications in computer vision, natural language processing and robotics are demonstrated as experiments.

preprint2014arXiv

Image Compressive Sensing Recovery Using Adaptively Learned Sparsifying Basis via L0 Minimization

From many fewer acquired measurements than suggested by the Nyquist sampling theory, compressive sensing (CS) theory demonstrates that, a signal can be reconstructed with high probability when it exhibits sparsity in some domain. Most of the conventional CS recovery approaches, however, exploited a set of fixed bases (e.g. DCT, wavelet and gradient domain) for the entirety of a signal, which are irrespective of the non-stationarity of natural signals and cannot achieve high enough degree of sparsity, thus resulting in poor CS recovery performance. In this paper, we propose a new framework for image compressive sensing recovery using adaptively learned sparsifying basis via L0 minimization. The intrinsic sparsity of natural images is enforced substantially by sparsely representing overlapped image patches using the adaptively learned sparsifying basis in the form of L0 norm, greatly reducing blocking artifacts and confining the CS solution space. To make our proposed scheme tractable and robust, a split Bregman iteration based technique is developed to solve the non-convex L0 minimization problem efficiently. Experimental results on a wide range of natural images for CS recovery have shown that our proposed algorithm achieves significant performance improvements over many current state-of-the-art schemes and exhibits good convergence property.

preprint2014arXiv

Individual dynamics induces symmetry in network controllability

Controlling complex networked systems to a desired state is a key research goal in contemporary science. Despite recent advances in studying the impact of network topology on controllability, a comprehensive understanding of the synergistic effect of network topology and individual dynamics on controllability is still lacking. Here we offer a theoretical study with particular interest in the diversity of dynamic units characterized by different types of individual dynamics. Interestingly, we find a global symmetry accounting for the invariance of controllability with respect to exchanging the densities of any two different types of dynamic units, irrespective of the network topology. The highest controllability arises at the global symmetry point, at which different types of dynamic units are of the same density. The lowest controllability occurs when all self-loops are either completely absent or present with identical weights. These findings further improve our understanding of network controllability and have implications for devising the optimal control of complex networked systems in a wide range of fields.

preprint2013arXiv

Exact Controllability of Complex Networks

Controlling complex networks is of paramount importance in science and engineering. Despite the recent development of structural-controllability theory, we continue to lack a framework to control undirected complex networks, especially given link weights. Here we introduce an exact-controllability paradigm based on the maximum multiplicity to identify the minimum set of driver nodes required to achieve full control of networks with arbitrary structures and link-weight distributions. The framework reproduces the structural controllability of directed networks characterized by structural matrices. We explore the controllability of a large number of real and model networks, finding that dense networks with identical weights are difficult to be controlled. An efficient and accurate tool is offered to assess the controllability of large sparse and dense networks. The exact-controllability framework enables a comprehensive understanding of the impact of network properties on controllability, a fundamental problem towards our ultimate control of complex systems.

preprint2012arXiv

Exploiting Image Local And Nonlocal Consistency For Mixed Gaussian-Impulse Noise Removal

Most existing image denoising algorithms can only deal with a single type of noise, which violates the fact that the noisy observed images in practice are often suffered from more than one type of noise during the process of acquisition and transmission. In this paper, we propose a new variational algorithm for mixed Gaussian-impulse noise removal by exploiting image local consistency and nonlocal consistency simultaneously. Specifically, the local consistency is measured by a hyper-Laplace prior, enforcing the local smoothness of images, while the nonlocal consistency is measured by three-dimensional sparsity of similar blocks, enforcing the nonlocal self-similarity of natural images. Moreover, a Split-Bregman based technique is developed to solve the above optimization problem efficiently. Extensive experiments for mixed Gaussian plus impulse noise show that significant performance improvements over the current state-of-the-art schemes have been achieved, which substantiates the effectiveness of the proposed algorithm.

preprint2012arXiv

Image Super-Resolution via Dual-Dictionary Learning And Sparse Representation

Learning-based image super-resolution aims to reconstruct high-frequency (HF) details from the prior model trained by a set of high- and low-resolution image patches. In this paper, HF to be estimated is considered as a combination of two components: main high-frequency (MHF) and residual high-frequency (RHF), and we propose a novel image super-resolution method via dual-dictionary learning and sparse representation, which consists of the main dictionary learning and the residual dictionary learning, to recover MHF and RHF respectively. Extensive experimental results on test images validate that by employing the proposed two-layer progressive scheme, more image details can be recovered and much better results can be achieved than the state-of-the-art algorithms in terms of both PSNR and visual perception.

Chen Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

41 published item(s)

Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions

Harnessing LLM Agents with Skill Programs

Histopathology-centered Computational Evolution of Spatial Omics: Integration, Mapping, and Foundation Models

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

The Unlearnability Phenomenon in RLVR for Language Models

YOLO-IOD: Towards Real Time Incremental Object Detection

Active information, missing data and prevalence estimation

Adaptive Fairness-Aware Online Meta-Learning for Changing Environments

Capillary rising in a tube with corners

Ego4D: Around the World in 3,000 Hours of Egocentric Video

End-to-End Active Speaker Detection

Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects

Layer Adaptive Deep Neural Networks for Out-of-distribution Detection

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning

Unsupervised Learning of 3D Semantic Keypoints with Mutual Reconstruction

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search

$L^2$-representation of Hodge Modules

A Deep Learning-Based Approach to Extracting Periosteal and Endosteal Contours of Proximal Femur in Quantitative CT Images

A Deep Learning-based Method to Extract Lumen and Media-Adventitia in Intravascular Ultrasound Images

A new approach to extracting coronary arteries and detecting stenosis in invasive coronary angiograms

A Primal-Dual Subgradient Approachfor Fair Meta Learning

Wetting dynamics in an angular channel

3D Fusion between Fluoroscopy Angiograms and SPECT Myocardial Perfusion Images to Guide Percutaneous Coronary Intervention

A Deep Learning-Based Method for Automatic Segmentation of Proximal Femur from Quantitative Computed Tomography Images

A Novel Method for ECG Signal Classification via One-Dimensional Convolutional Neural Network

Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization

G-TAD: Sub-Graph Localization for Temporal Action Detection

MacPherson's Conjecture via Hörmander Estimate

Potassium Isotope Compositions of Carbonaceous and Ordinary Chondrites: Implications on the Origin of Volatile Depletion in the Early Solar System

Potassium Isotopic Compositions of Enstatite Meteorites

Structure and overstability of resistive modes with runaway electrons

Wetting equilibrium in a rectangular channel

Memories in the Photoluminescence Intermittency of Single Cesium Lead Bromide Nanocrystals

Fundamental building blocks of controlling complex networks: A universal controllability framework

LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning

Image Compressive Sensing Recovery Using Adaptively Learned Sparsifying Basis via L0 Minimization

Individual dynamics induces symmetry in network controllability

Exact Controllability of Complex Networks

Exploiting Image Local And Nonlocal Consistency For Mixed Gaussian-Impulse Noise Removal

Image Super-Resolution via Dual-Dictionary Learning And Sparse Representation