Researcher profile

Dongxiao Zhu

Dongxiao Zhu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

Robustness of Transformer-Based Fluence Map Prediction Under Clinically Realistic Perturbations

Learning-based fluence map prediction offers a fast alternative to iterative inverse planning in intensity-modulated radiation therapy (IMRT), but its robustness under realistic distribution shifts remains unclear. We study a two-stage transformer pipeline that maps anatomy (CT and contours) to dose and then to beamlet fluence maps. We compare fluence-stage transformer backbones with hierarchical, global, and hybrid attention, trained with a physics-informed loss enforcing energy consistency. Robustness is evaluated under geometric perturbations, radiometric noise, reduced training data, and domain shifts using a prostate IMRT dataset, with additional evaluation of the dose stage on public datasets. Results show smooth degradation under moderate perturbations but sharp failures under severe rotations and noise. Hierarchical transformers (e.g., SwinUNETR) exhibit slower growth in upper-quartile energy error, indicating improved robustness. We further show that SSIM alone fails to capture clinically relevant errors, highlighting the need for physics-informed evaluation.

preprint2025arXiv

Not All Tokens Are Meant to Be Forgotten

Large Language Models (LLMs), pre-trained on massive text corpora, exhibit remarkable human-level language understanding, reasoning, and decision-making abilities. However, they tend to memorize unwanted information, such as private or copyrighted content, raising significant privacy and legal concerns. Unlearning has emerged as a promising solution, but existing methods face a significant challenge of over-forgetting. This issue arises because they indiscriminately suppress the generation of all the tokens in forget samples, leading to a substantial loss of model utility. To overcome this challenge, we introduce the Targeted Information Forgetting (TIF) framework, which consists of (1) a flexible targeted information identifier designed to differentiate between unwanted words (UW) and general words (GW) in the forget samples, and (2) a novel Targeted Preference Optimization approach that leverages Logit Preference Loss to unlearn unwanted information associated with UW and Preservation Loss to retain general information in GW, effectively improving the unlearning process while mitigating utility degradation. Extensive experiments on the TOFU and MUSE benchmarks demonstrate that the proposed TIF framework enhances unlearning effectiveness while preserving model utility and achieving state-of-the-art results.

preprint2022arXiv

Saliency Guided Adversarial Training for Learning Generalizable Features with Applications to Medical Imaging Classification System

This work tackles a central machine learning problem of performance degradation on out-of-distribution (OOD) test sets. The problem is particularly salient in medical imaging based diagnosis system that appears to be accurate but fails when tested in new hospitals/datasets. Recent studies indicate the system might learn shortcut and non-relevant features instead of generalizable features, so-called good features. We hypothesize that adversarial training can eliminate shortcut features whereas saliency guided training can filter out non-relevant features; both are nuisance features accounting for the performance degradation on OOD test sets. With that, we formulate a novel model training scheme for the deep neural network to learn good features for classification and/or detection tasks ensuring a consistent generalization performance on OOD test sets. The experimental results qualitatively and quantitatively demonstrate the superior performance of our method using the benchmark CXR image data sets on classification tasks.

preprint2021arXiv

Adversarially Robust and Explainable Model Compression with On-Device Personalization for Text Classification

On-device Deep Neural Networks (DNNs) have recently gained more attention due to the increasing computing power of the mobile devices and the number of applications in Computer Vision (CV), Natural Language Processing (NLP), and Internet of Things (IoTs). Unfortunately, the existing efficient convolutional neural network (CNN) architectures designed for CV tasks are not directly applicable to NLP tasks and the tiny Recurrent Neural Network (RNN) architectures have been designed primarily for IoT applications. In NLP applications, although model compression has seen initial success in on-device text classification, there are at least three major challenges yet to be addressed: adversarial robustness, explainability, and personalization. Here we attempt to tackle these challenges by designing a new training scheme for model compression and adversarial robustness, including the optimization of an explainable feature mapping objective, a knowledge distillation objective, and an adversarially robustness objective. The resulting compressed model is personalized using on-device private training data via fine-tuning. We perform extensive experiments to compare our approach with both compact RNN (e.g., FastGRNN) and compressed RNN (e.g., PRADO) architectures in both natural and adversarial NLP test settings.

preprint2020arXiv

COVID-MobileXpert: On-Device COVID-19 Patient Triage and Follow-up using Chest X-rays

During the COVID-19 pandemic, there has been an emerging need for rapid, dedicated, and point-of-care COVID-19 patient disposition techniques to optimize resource utilization and clinical workflow. In view of this need, we present COVID-MobileXpert: a lightweight deep neural network (DNN) based mobile app that can use chest X-ray (CXR) for COVID-19 case screening and radiological trajectory prediction. We design and implement a novel three-player knowledge transfer and distillation (KTD) framework including a pre-trained attending physician (AP) network that extracts CXR imaging features from a large scale of lung disease CXR images, a fine-tuned resident fellow (RF) network that learns the essential CXR imaging features to discriminate COVID-19 from pneumonia and/or normal cases with a small amount of COVID-19 cases, and a trained lightweight medical student (MS) network to perform on-device COVID-19 patient triage and follow-up. To tackle the challenge of vastly similar and dominant fore- and background in medical images, we employ novel loss functions and training schemes for the MS network to learn the robust features. We demonstrate the significant potential of COVID-MobileXpert for rapid deployment via extensive experiments with diverse MS architecture and tuning parameter settings. The source codes for cloud and mobile based models are available from the following url: https://github.com/xinli0928/COVID-Xray.

preprint2020arXiv

CRCEN: A Generalized Cost-sensitive Neural Network Approach for Imbalanced Classification

Classification on imbalanced datasets is a challenging task in real-world applications. Training conventional classification algorithms directly by minimizing classification error in this scenario can compromise model performance for minority class while optimizing performance for majority class. Traditional approaches to the imbalance problem include re-sampling and cost-sensitive methods. In this paper, we propose a neural network model with novel loss function, CRCEN, for imbalanced classification. Based on the weighted version of cross entropy loss, we provide a theoretical relation for model predicted probability, imbalance ratio and the weighting mechanism. To demonstrate the effectiveness of our proposed model, CRCEN is tested on several benchmark datasets and compared with baseline models.

preprint2020arXiv

Defending against adversarial attacks on medical imaging AI system, classification or detection?

Medical imaging AI systems such as disease classification and segmentation are increasingly inspired and transformed from computer vision based AI systems. Although an array of adversarial training and/or loss function based defense techniques have been developed and proved to be effective in computer vision, defending against adversarial attacks on medical images remains largely an uncharted territory due to the following unique challenges: 1) label scarcity in medical images significantly limits adversarial generalizability of the AI system; 2) vastly similar and dominant fore- and background in medical images make it hard samples for learning the discriminating features between different disease classes; and 3) crafted adversarial noises added to the entire medical image as opposed to the focused organ target can make clean and adversarial examples more discriminate than that between different disease classes. In this paper, we propose a novel robust medical imaging AI framework based on Semi-Supervised Adversarial Training (SSAT) and Unsupervised Adversarial Detection (UAD), followed by designing a new measure for assessing systems adversarial risk. We systematically demonstrate the advantages of our robust medical imaging AI system over the existing adversarial defense techniques under diverse real-world settings of adversarial attacks using a benchmark OCT imaging data set.

preprint2020arXiv

Explainable Recommendation via Interpretable Feature Mapping and Evaluation of Explainability

Latent factor collaborative filtering (CF) has been a widely used technique for recommender system by learning the semantic representations of users and items. Recently, explainable recommendation has attracted much attention from research community. However, trade-off exists between explainability and performance of the recommendation where metadata is often needed to alleviate the dilemma. We present a novel feature mapping approach that maps the uninterpretable general features onto the interpretable aspect features, achieving both satisfactory accuracy and explainability in the recommendations by simultaneous minimization of rating prediction loss and interpretation loss. To evaluate the explainability, we propose two new evaluation metrics specifically designed for aspect-level explanation using surrogate ground truth. Experimental results demonstrate a strong performance in both recommendation and explaining explanation, eliminating the need for metadata. Code is available from https://github.com/pd90506/AMCF.

preprint2020arXiv

Improve SGD Training via Aligning Mini-batches

Deep neural networks (DNNs) for supervised learning can be viewed as a pipeline of a feature extractor (i.e. last hidden layer) and a linear classifier (i.e. output layer) that is trained jointly with stochastic gradient descent (SGD). In each iteration of SGD, a mini-batch from the training data is sampled and the true gradient of the loss function is estimated as the noisy gradient calculated on this mini-batch. From the feature learning perspective, the feature extractor should be updated to learn meaningful features with respect to the entire data, and reduce the accommodation to noise in the mini-batch. With this motivation, we propose In-Training Distribution Matching (ITDM) to improve DNN training and reduce overfitting. Specifically, along with the loss function, ITDM regularizes the feature extractor by matching the moments of distributions of different mini-batches in each iteration of SGD, which is fulfilled by minimizing the maximum mean discrepancy. As such, ITDM does not assume any explicit parametric form of data distribution in the latent feature space. Extensive experiments are conducted to demonstrate the effectiveness of our proposed strategy.

preprint2020arXiv

On the Learning Property of Logistic and Softmax Losses for Deep Neural Networks

Deep convolutional neural networks (CNNs) trained with logistic and softmax losses have made significant advancement in visual recognition tasks in computer vision. When training data exhibit class imbalances, the class-wise reweighted version of logistic and softmax losses are often used to boost performance of the unweighted version. In this paper, motivated to explain the reweighting mechanism, we explicate the learning property of those two loss functions by analyzing the necessary condition (e.g., gradient equals to zero) after training CNNs to converge to a local minimum. The analysis immediately provides us explanations for understanding (1) quantitative effects of the class-wise reweighting mechanism: deterministic effectiveness for binary classification using logistic loss yet indeterministic for multi-class classification using softmax loss; (2) disadvantage of logistic loss for single-label multi-class classification via one-vs.-all approach, which is due to the averaging effect on predicted probabilities for the negative class (e.g., non-target classes) in the learning process. With the disadvantage and advantage of logistic loss disentangled, we thereafter propose a novel reweighted logistic loss for multi-class classification. Our simple yet effective formulation improves ordinary logistic loss by focusing on learning hard non-target classes (target vs. non-target class in one-vs.-all) and turned out to be competitive with softmax loss. We evaluate our method on several benchmark datasets to demonstrate its effectiveness.

preprint2020arXiv

Tackling Ordinal Regression Problem for Heterogeneous Data: Sparse and Deep Multi-Task Learning Approaches

Many real-world datasets are labeled with natural orders, i.e., ordinal labels. Ordinal regression is a method to predict ordinal labels that finds a wide range of applications in data-rich domains, such as natural, health and social sciences. Most existing ordinal regression approaches work well for independent and identically distributed (IID) instances via formulating a single ordinal regression task. However, for heterogeneous non-IID instances with well-defined local geometric structures, e.g., subpopulation groups, multi-task learning (MTL) provides a promising framework to encode task (subgroup) relatedness, bridge data from all tasks, and simultaneously learn multiple related tasks in efforts to improve generalization performance. Even though MTL methods have been extensively studied, there is barely existing work investigating MTL for heterogeneous data with ordinal labels. We tackle this important problem via sparse and deep multi-task approaches. Specifically, we develop a regularized multi-task ordinal regression (MTOR) model for smaller datasets and a deep neural networks based MTOR model for large-scale datasets. We evaluate the performance using three real-world healthcare datasets with applications to multi-stage disease progression diagnosis. Our experiments indicate that the proposed MTOR models markedly improve the prediction performance comparing with single-task ordinal regression models.

preprint2020arXiv

Toward Tag-free Aspect Based Sentiment Analysis: A Multiple Attention Network Approach

Existing aspect based sentiment analysis (ABSA) approaches leverage various neural network models to extract the aspect sentiments via learning aspect-specific feature representations. However, these approaches heavily rely on manual tagging of user reviews according to the predefined aspects as the input, a laborious and time-consuming process. Moreover, the underlying methods do not explain how and why the opposing aspect level polarities in a user review lead to the overall polarity. In this paper, we tackle these two problems by designing and implementing a new Multiple-Attention Network (MAN) approach for more powerful ABSA without the need for aspect tags using two new tag-free data sets crawled directly from TripAdvisor ({https://www.tripadvisor.com}). With the Self- and Position-Aware attention mechanism, MAN is capable of extracting both aspect level and overall sentiments from the text reviews using the aspect level and overall customer ratings, and it can also detect the vital aspect(s) leading to the overall sentiment polarity among different aspects via a new aspect ranking scheme. We carry out extensive experiments to demonstrate the strong performance of MAN compared to other state-of-the-art ABSA approaches and the explainability of our approach by visualizing and interpreting attention weights in case studies.

preprint2020arXiv

Vispi: Automatic Visual Perception and Interpretation of Chest X-rays

Medical imaging contains the essential information for rendering diagnostic and treatment decisions. Inspecting (visual perception) and interpreting image to generate a report are tedious clinical routines for a radiologist where automation is expected to greatly reduce the workload. Despite rapid development of natural image captioning, computer-aided medical image visual perception and interpretation remain a challenging task, largely due to the lack of high-quality annotated image-report pairs and tailor-made generative models for sufficient extraction and exploitation of localized semantic features, particularly those associated with abnormalities. To tackle these challenges, we present Vispi, an automatic medical image interpretation system, which first annotates an image via classifying and localizing common thoracic diseases with visual support and then followed by report generation from an attentive LSTM model. Analyzing an open IU X-ray dataset, we demonstrate a superior performance of Vispi in disease classification, localization and report generation using automatic performance evaluation metrics ROUGE and CIDEr.