Researcher profile

Xiaofeng Liu

Xiaofeng Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
27works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

27 published item(s)

preprint2026arXiv

Finite Volume-Informed Neural Network Framework for 2D Shallow Water Equations: Rugged Loss Landscapes and the Importance of Data Guidance

Physics-informed neural networks (PINNs) are a simple surrogate-modelling paradigm for partial differential equations, but their standard strong-form residual formulation is ill suited to the shallow water equations (SWE). It cannot enforce local conservation, handle discontinuities, or leverage the boundary-conforming unstructured meshes used in real-world applications. We introduce ``Data-Guided FVM-PINN'', a framework that replaces the strong-form residual with a differentiable, well-balanced Roe Riemann-solver finite-volume (FVM) loss evaluated on unstructured meshes. The major finding is that physics-only FVM-PINN training often fails on realistic 2D problems: the network collapses to a trivial low-momentum state that nearly satisfies the FVM-PINN residual but bears no resemblance to the true flow. A loss-landscape diagnostic shows that the FVM-PINN loss at zero momentum is only about $7\times$ larger than at the trained solution, a shallow basin that an ordinary optimizer falls into; adding even sparse data turns this into a $310\times$ separation, breaking the degeneracy. On a 2D block-in-channel benchmark, just $200$ random velocity measurements drop the velocity-field $L_2$ error by $22\times$ versus physics-only; $50$ measurements still deliver a $7\times$ reduction. A controlled ablation isolates the contribution of the FVM-PINN loss: it reduces velocity-field $L_2$ by $\sim$$23\%$ in the sparse-data regime and is essentially neutral when dense reference data is available. On a real-world Savannah River reach ($1306$ cells, $3600$~s simulation, five Manning zones), the framework constructs an accurate surrogate from SRH-2D anchor data, with time-window decomposition reducing error monotonically via progressive initial-condition handoff.

preprint2026arXiv

MODE: Efficient Time Series Prediction with Mamba Enhanced by Low-Rank Neural ODEs

Time series prediction plays a pivotal role across diverse domains such as finance, healthcare, energy systems, and environmental modeling. However, existing approaches often struggle to balance efficiency, scalability, and accuracy, particularly when handling long-range dependencies and irregularly sampled data. To address these challenges, we propose MODE, a unified framework that integrates Low-Rank Neural Ordinary Differential Equations (Neural ODEs) with an Enhanced Mamba architecture. As illustrated in our framework, the input sequence is first transformed by a Linear Tokenization Layer and then processed through multiple Mamba Encoder blocks, each equipped with an Enhanced Mamba Layer that employs Causal Convolution, SiLU activation, and a Low-Rank Neural ODE enhancement to efficiently capture temporal dynamics. This low-rank formulation reduces computational overhead while maintaining expressive power. Furthermore, a segmented selective scanning mechanism, inspired by pseudo-ODE dynamics, adaptively focuses on salient subsequences to improve scalability and long-range sequence modeling. Extensive experiments on benchmark datasets demonstrate that MODE surpasses existing baselines in both predictive accuracy and computational efficiency. Overall, our contributions include: (1) a unified and efficient architecture for long-term time series modeling, (2) integration of Mamba's selective scanning with low-rank Neural ODEs for enhanced temporal representation, and (3) substantial improvements in efficiency and scalability enabled by low-rank approximation and dynamic selective scanning.

preprint2026arXiv

Physics-Informed Deep Operator Learning for Computational Hydraulics Modeling

Traditional 2D hydraulic models face significant computational challenges that limit their applications that are time-sensitive or require many model evaluations. This study presents a physics-informed Deep Operator Network (DeepONet) framework for computational hydraulics modeling that learns the solution operator of the 2D shallow water equations (SWEs) to create fast surrogate models. The framework can operate in two modes: a purely data-driven SWE-DeepONet that learns from numerical solver such as SRH-2D, and a physics-informed PI-SWE-DeepONet that additionally incorporates the continuous SWEs as constraints during training. Based on a real-world case, steady flows in a reach of the Sacramento River in California, it is demonstrated that PI-SWE-DeepONet possesses much enhanced prediction capability than SWE-DeepONet when applied to out-of-distribution scenarios. The physics-informed model is shown to exhibit slower error growth and larger breakdown distances in comparison with SWE-DeepONet. The gain of the physics-informed training, however, comes with costs, chief among which are the simulated results have slightly higher errors for in-distribution cases. It reflects the existence of a tension between the two competing training objectives: fitting the results from the traditional hydraulic model and satisfying the continuous governing equations. In this study, guidelines are developed for selecting the appropriate approach based on a real-world case: PI-SWE-DeepONet is preferred for out-of-distribution predictions, uncertain training data, or when physical consistency is a priority, while SWE-DeepONet is recommended if the modeling objective is to replicate faithfully the traditional hydraulic model results within the training distribution. Other challenges are also discussed, such as the loss weighting approach.

preprint2022arXiv

A Multi-modal Fusion Framework Based on Multi-task Correlation Learning for Cancer Prognosis Prediction

Morphological attributes from histopathological images and molecular profiles from genomic data are important information to drive diagnosis, prognosis, and therapy of cancers. By integrating these heterogeneous but complementary data, many multi-modal methods are proposed to study the complex mechanisms of cancers, and most of them achieve comparable or better results from previous single-modal methods. However, these multi-modal methods are restricted to a single task (e.g., survival analysis or grade classification), and thus neglect the correlation between different tasks. In this study, we present a multi-modal fusion framework based on multi-task correlation learning (MultiCoFusion) for survival analysis and cancer grade classification, which combines the power of multiple modalities and multiple tasks. Specifically, a pre-trained ResNet-152 and a sparse graph convolutional network (SGCN) are used to learn the representations of histopathological images and mRNA expression data respectively. Then these representations are fused by a fully connected neural network (FCNN), which is also a multi-task shared network. Finally, the results of survival analysis and cancer grade classification output simultaneously. The framework is trained by an alternate scheme. We systematically evaluate our framework using glioma datasets from The Cancer Genome Atlas (TCGA). Results demonstrate that MultiCoFusion learns better representations than traditional feature extraction methods. With the help of multi-task alternating learning, even simple multi-modal concatenation can achieve better performance than other deep learning and traditional methods. Multi-task learning can improve the performance of multiple tasks not just one of them, and it is effective in both single-modal and multi-modal data.

preprint2022arXiv

AI Enlightens Wireless Communication: A Transformer Backbone for CSI Feedback

This paper is based on the background of the 2nd Wireless Communication Artificial Intelligence (AI) Competition (WAIC) which is hosted by IMT-2020(5G) Promotion Group 5G+AIWork Group, where the framework of the eigenvector-based channel state information (CSI) feedback problem is firstly provided. Then a basic Transformer backbone for CSI feedback referred to EVCsiNet-T is proposed. Moreover, a series of potential enhancements for deep learning based (DL-based) CSI feedback including i) data augmentation, ii) loss function design, iii) training strategy, and iv) model ensemble are introduced. The experimental results involving the comparison between EVCsiNet-T and traditional codebook methods over different channels are further provided, which show the advanced performance and a promising prospect of Transformer on DL-based CSI feedback problem.

preprint2022arXiv

Bathymetry Inversion using a Deep-Learning-Based Surrogate for Shallow Water Equations Solvers

River bathymetry is critical for many aspects of water resources management. We propose and demonstrate a bathymetry inversion method using a deep-learning-based surrogate for shallow water equations solvers. The surrogate uses the convolutional autoencoder with a shared-encoder, separate-decoder architecture. It encodes the input bathymetry and decodes to separate outputs for flow-field variables. A gradient-based optimizer is used to perform bathymetry inversion with the trained surrogate. Two physically-based constraints on both bed elevation value and slope have to be added as inversion loss regularizations to obtain usable inversion results. Using the "L-curve" criterion, a heuristic approach was proposed to determine the regularization parameters. Both the surrogate model and the inversion algorithm show good performance. We found the bathymetry inversion process has two distinctive stages, which resembles the sculptural process of initial broad-brush calving and final detailing. The inversion loss due to flow prediction error reaches its minimum in the first stage and remains almost constant afterward. The bed elevation value and slope regularizations play the dominant role in the second stage in selecting the most probable solution. We also found the surrogate architecture (whether with both velocity and water surface elevation or velocity only as outputs) does not show significant impact on inversion result.

preprint2022arXiv

Constraining Pseudo-label in Self-training Unsupervised Domain Adaptation with Energy-based Model

Deep learning is usually data starved, and the unsupervised domain adaptation (UDA) is developed to introduce the knowledge in the labeled source domain to the unlabeled target domain. Recently, deep self-training presents a powerful means for UDA, involving an iterative process of predicting the target domain and then taking the confident predictions as hard pseudo-labels for retraining. However, the pseudo-labels are usually unreliable, thus easily leading to deviated solutions with propagated errors. In this paper, we resort to the energy-based model and constrain the training of the unlabeled target sample with an energy function minimization objective. It can be achieved via a simple additional regularization or an energy-based loss. This framework allows us to gain the benefits of the energy-based model, while retaining strong discriminative performance following a plug-and-play fashion. The convergence property and its connection with classification expectation minimization are investigated. We deliver extensive experiments on the most popular and large-scale UDA benchmarks of image classification as well as semantic segmentation to demonstrate its generality and effectiveness.

preprint2022arXiv

Deep Unsupervised Domain Adaptation: A Review of Recent Advances and Perspectives

Deep learning has become the method of choice to tackle real-world problems in different domains, partly because of its ability to learn from data and achieve impressive performance on a wide range of applications. However, its success usually relies on two assumptions: (i) vast troves of labeled datasets are required for accurate model fitting, and (ii) training and testing data are independent and identically distributed. Its performance on unseen target domains, thus, is not guaranteed, especially when encountering out-of-distribution data at the adaptation stage. The performance drop on data in a target domain is a critical problem in deploying deep neural networks that are successfully trained on data in a source domain. Unsupervised domain adaptation (UDA) is proposed to counter this, by leveraging both labeled source domain data and unlabeled target domain data to carry out various tasks in the target domain. UDA has yielded promising results on natural image processing, video analysis, natural language processing, time-series data analysis, medical image analysis, etc. In this review, as a rapidly evolving topic, we provide a systematic comparison of its methods and applications. In addition, the connection of UDA with its closely related tasks, e.g., domain generalization and out-of-distribution detection, has also been discussed. Furthermore, deficiencies in current methods and possible promising directions are highlighted.

preprint2022arXiv

Feasibility study of clinical target volume definition for soft-tissue sarcoma using muscle fiber orientations derived from diffusion tensor imaging

Objective: Soft-tissue sarcoma spreads preferentially along muscle fibers. We explore the utility of deriving muscle fiber orientations from diffusion tensor MRI (DT-MRI) for defining the boundary of the clinical target volume in muscle tissue. Approach: We recruited eight healthy volunteers to acquire MR images of the left and right thigh. The imaging session consisted of (a) two MRI spin-echo-based scans, T1- and T2-weighted; (b) a diffusion weighted (DW) spin-echo-based scan using an echo planar acquisition with fat suppression. The thigh muscles were auto-segmented using CNN. DT-MRI data was used as a geometry encoding input to solve the anisotropic Eikonal equation with Hamiltonian Fast-Marching method. The isosurfaces of the solution modeled the CTV boundary. Main results: The auto-segmented muscles of the thigh agreed with manually delineated with the Dice score ranging from 0.8 to 0.94 for different muscles. Anisotropy of the isosurfaces was compared across muscles with different anatomical orientations within a thigh, between muscles in left and right thighs of each subject, and between different subjects. Analysis showed a high degree of consistency across all comparisons. The distance from the GTV to the isosurface and the eigenvalues ratio are two controlling parameters for the extent and shape of the CTV. Significance: Our feasibility study with healthy volunteers shows the promise of using muscle fiber orientations derived from diffusion weighted MRI data for automated generation of anisotropic CTV boundary in soft tissue sarcoma. Our contribution is significant as it is expected to lead to the improvements in the treatment outcomes of soft-tissue sarcoma patients undergoing radiotherapy and decrease amputation rate for a subset of patients. We expect such improvements to have a strong positive impact for the cancer centers with small volume of sarcoma patients.

preprint2022arXiv

Self-semantic contour adaptation for cross modality brain tumor segmentation

Unsupervised domain adaptation (UDA) between two significantly disparate domains to learn high-level semantic alignment is a crucial yet challenging task.~To this end, in this work, we propose exploiting low-level edge information to facilitate the adaptation as a precursor task, which has a small cross-domain gap, compared with semantic segmentation.~The precise contour then provides spatial information to guide the semantic adaptation. More specifically, we propose a multi-task framework to learn a contouring adaptation network along with a semantic segmentation adaptation network, which takes both magnetic resonance imaging (MRI) slice and its initial edge map as input.~These two networks are jointly trained with source domain labels, and the feature and edge map level adversarial learning is carried out for cross-domain alignment. In addition, self-entropy minimization is incorporated to further enhance segmentation performance. We evaluated our framework on the BraTS2018 database for cross-modality segmentation of brain tumors, showing the validity and superiority of our approach, compared with competing methods.

preprint2022arXiv

Structure-aware Unsupervised Tagged-to-Cine MRI Synthesis with Self Disentanglement

Cycle reconstruction regularized adversarial training -- e.g., CycleGAN, DiscoGAN, and DualGAN -- has been widely used for image style transfer with unpaired training data. Several recent works, however, have shown that local distortions are frequent, and structural consistency cannot be guaranteed. Targeting this issue, prior works usually relied on additional segmentation or consistent feature extraction steps that are task-specific. To counter this, this work aims to learn a general add-on structural feature extractor, by explicitly enforcing the structural alignment between an input and its synthesized image. Specifically, we propose a novel input-output image patches self-training scheme to achieve a disentanglement of underlying anatomical structures and imaging modalities. The translator and structure encoder are updated, following an alternating training protocol. In addition, the information w.r.t. imaging modality can be eliminated with an asymmetric adversarial game. We train, validate, and test our network on 1,768, 416, and 1,560 unpaired subject-independent slices of tagged and cine magnetic resonance imaging from a total of twenty healthy subjects, respectively, demonstrating superior performance over competing methods.

preprint2022arXiv

Subtype-Aware Dynamic Unsupervised Domain Adaptation

Unsupervised domain adaptation (UDA) has been successfully applied to transfer knowledge from a labeled source domain to target domains without their labels. Recently introduced transferable prototypical networks (TPN) further addresses class-wise conditional alignment. In TPN, while the closeness of class centers between source and target domains is explicitly enforced in a latent space, the underlying fine-grained subtype structure and the cross-domain within-class compactness have not been fully investigated. To counter this, we propose a new approach to adaptively perform a fine-grained subtype-aware alignment to improve performance in the target domain without the subtype label in both domains. The insight of our approach is that the unlabeled subtypes in a class have the local proximity within a subtype, while exhibiting disparate characteristics, because of different conditional and label shifts. Specifically, we propose to simultaneously enforce subtype-wise compactness and class-wise separation, by utilizing intermediate pseudo-labels. In addition, we systematically investigate various scenarios with and without prior knowledge of subtype numbers, and propose to exploit the underlying subtype structure. Furthermore, a dynamic queue framework is developed to evolve the subtype cluster centroids steadily using an alternative processing scheme. Experimental results, carried out with multi-view congenital heart disease data and VisDA and DomainNet, show the effectiveness and validity of our subtype-aware UDA, compared with state-of-the-art UDA methods.

preprint2022arXiv

Tensor Decomposition based Personalized Federated Learning

Federated learning (FL) is a new distributed machine learning framework that can achieve reliably collaborative training without collecting users' private data. However, due to FL's frequent communication and average aggregation strategy, they experience challenges scaling to statistical diversity data and large-scale models. In this paper, we propose a personalized FL framework, named Tensor Decomposition based Personalized Federated learning (TDPFed), in which we design a novel tensorized local model with tensorized linear layers and convolutional layers to reduce the communication cost. TDPFed uses a bi-level loss function to decouple personalized model optimization from the global model learning by controlling the gap between the personalized model and the tensorized local model. Moreover, an effective distributed learning strategy and two different model aggregation strategies are well designed for the proposed TDPFed framework. Theoretical convergence analysis and thorough experiments demonstrate that our proposed TDPFed framework achieves state-of-the-art performance while reducing the communication cost.

preprint2022arXiv

Unsupervised Domain Adaptation for Segmentation with Black-box Source Model

Unsupervised domain adaptation (UDA) has been widely used to transfer knowledge from a labeled source domain to an unlabeled target domain to counter the difficulty of labeling in a new domain. The training of conventional solutions usually relies on the existence of both source and target domain data. However, privacy of the large-scale and well-labeled data in the source domain and trained model parameters can become the major concern of cross center/domain collaborations. In this work, to address this, we propose a practical solution to UDA for segmentation with a black-box segmentation model trained in the source domain only, rather than original source data or a white-box source model. Specifically, we resort to a knowledge distillation scheme with exponential mixup decay (EMD) to gradually learn target-specific representations. In addition, unsupervised entropy minimization is further applied to regularization of the target domain confidence. We evaluated our framework on the BraTS 2018 database, achieving performance on par with white-box source model adaptation approaches.

preprint2022arXiv

Variational Inference for Quantifying Inter-observer Variability in Segmentation of Anatomical Structures

Lesions or organ boundaries visible through medical imaging data are often ambiguous, thus resulting in significant variations in multi-reader delineations, i.e., the source of aleatoric uncertainty. In particular, quantifying the inter-observer variability of manual annotations with Magnetic Resonance (MR) Imaging data plays a crucial role in establishing a reference standard for various diagnosis and treatment tasks. Most segmentation methods, however, simply model a mapping from an image to its single segmentation map and do not take the disagreement of annotators into consideration. In order to account for inter-observer variability, without sacrificing accuracy, we propose a novel variational inference framework to model the distribution of plausible segmentation maps, given a specific MR image, which explicitly represents the multi-reader variability. Specifically, we resort to a latent vector to encode the multi-reader variability and counteract the inherent information loss in the imaging data. Then, we apply a variational autoencoder network and optimize its evidence lower bound (ELBO) to efficiently approximate the distribution of the segmentation map, given an MR image. Experimental results, carried out with the QUBIQ brain growth MRI segmentation datasets with seven annotators, demonstrate the effectiveness of our approach.

preprint2021arXiv

Deep learning-based GTV contouring modeling inter- and intra- observer variability in sarcomas

Background and purpose: The delineation of the gross tumor volume (GTV) is a critical step for radiation therapy treatment planning. The delineation procedure is typically performed manually which exposes two major issues: cost and reproducibility. Delineation is a time-consuming process that is subject to inter- and intra-observer variability. While methods have been proposed to predict GTV contours, typical approaches ignore variability and therefore fail to utilize the valuable confidence information offered by multiple contours. Materials and methods: In this work we propose an automatic GTV contouring method for soft-tissue sarcomas from X-ray computed tomography (CT) images, using deep learning by integrating inter- and intra-observer variability in the learned model. Sixty-eight patients with soft tissue and bone sarcomas were considered in this evaluation, all underwent pre-operative CT imaging used to perform GTV delineation. Four radiation oncologists and radiologists performed three contouring trials each for all patients. We quantify variability by defining confidence levels based on the frequency of inclusion of a given voxel into the GTV and use a deep convolutional neural network to learn GTV confidence maps. Results: Results were compared to confidence maps from the four readers as well as ground-truth consensus contours established jointly by all readers. The resulting continuous Dice score between predicted and true confidence maps was 87% and the Hausdorff distance was 14 mm. Conclusion: Results demonstrate the ability of the proposed method to predict accurate contours while utilizing variability and as such it can be used to improve clinical workflow.

preprint2021arXiv

Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models

AI Safety is a major concern in many deep learning applications such as autonomous driving. Given a trained deep learning model, an important natural problem is how to reliably verify the model's prediction. In this paper, we propose a novel framework -- deep verifier networks (DVN) to verify the inputs and outputs of deep discriminative models with deep generative models. Our proposed model is based on conditional variational auto-encoders with disentanglement constraints. We give both intuitive and theoretical justifications of the model. Our verifier network is trained independently with the prediction model, which eliminates the need of retraining the verifier network for a new model. We test the verifier network on out-of-distribution detection and adversarial example detection problems, as well as anomaly detection problems in structured prediction tasks such as image caption generation. We achieve state-of-the-art results in all of these problems.

preprint2021arXiv

Energy-constrained Self-training for Unsupervised Domain Adaptation

Unsupervised domain adaptation (UDA) aims to transfer the knowledge on a labeled source domain distribution to perform well on an unlabeled target domain. Recently, the deep self-training involves an iterative process of predicting on the target domain and then taking the confident predictions as hard pseudo-labels for retraining. However, the pseudo-labels are usually unreliable, and easily leading to deviated solutions with propagated errors. In this paper, we resort to the energy-based model and constrain the training of the unlabeled target sample with the energy function minimization objective. It can be applied as a simple additional regularization. In this framework, it is possible to gain the benefits of the energy-based model, while retaining strong discriminative performance following a plug-and-play fashion. We deliver extensive experiments on the most popular and large scale UDA benchmarks of image classification as well as semantic segmentation to demonstrate its generality and effectiveness.

preprint2021arXiv

Identity-aware Facial Expression Recognition in Compressed Video

This paper targets to explore the inter-subject variations eliminated facial expression representation in the compressed video domain. Most of the previous methods process the RGB images of a sequence, while the off-the-shelf and valuable expression-related muscle movement already embedded in the compression format. In the up to two orders of magnitude compressed domain, we can explicitly infer the expression from the residual frames and possible to extract identity factors from the I frame with a pre-trained face recognition network. By enforcing the marginal independent of them, the expression feature is expected to be purer for the expression and be robust to identity shifts. We do not need the identity label or multiple expression samples from the same person for identity elimination. Moreover, when the apex frame is annotated in the dataset, the complementary constraint can be further added to regularize the feature-level game. In testing, only the compressed residual frames are required to achieve expression prediction. Our solution can achieve comparable or better performance than the recent decoded image based methods on the typical FER benchmarks with about 3$\times$ faster inference with compressed data.

preprint2021arXiv

Subtype-aware Unsupervised Domain Adaptation for Medical Diagnosis

Recent advances in unsupervised domain adaptation (UDA) show that transferable prototypical learning presents a powerful means for class conditional alignment, which encourages the closeness of cross-domain class centroids. However, the cross-domain inner-class compactness and the underlying fine-grained subtype structure remained largely underexplored. In this work, we propose to adaptively carry out the fine-grained subtype-aware alignment by explicitly enforcing the class-wise separation and subtype-wise compactness with intermediate pseudo labels. Our key insight is that the unlabeled subtypes of a class can be divergent to one another with different conditional and label shifts, while inheriting the local proximity within a subtype. The cases of with or without the prior information on subtype numbers are investigated to discover the underlying subtype structure in an online fashion. The proposed subtype-aware dynamic UDA achieves promising results on medical diagnosis tasks.

preprint2021arXiv

Symmetric-Constrained Irregular Structure Inpainting for Brain MRI Registration with Tumor Pathology

Deformable registration of magnetic resonance images between patients with brain tumors and healthy subjects has been an important tool to specify tumor geometry through location alignment and facilitate pathological analysis. Since tumor region does not match with any ordinary brain tissue, it has been difficult to deformably register a patients brain to a normal one. Many patient images are associated with irregularly distributed lesions, resulting in further distortion of normal tissue structures and complicating registration's similarity measure. In this work, we follow a multi-step context-aware image inpainting framework to generate synthetic tissue intensities in the tumor region. The coarse image-to-image translation is applied to make a rough inference of the missing parts. Then, a feature-level patch-match refinement module is applied to refine the details by modeling the semantic relevance between patch-wise features. A symmetry constraint reflecting a large degree of anatomical symmetry in the brain is further proposed to achieve better structure understanding. Deformable registration is applied between inpainted patient images and normal brains, and the resulting deformation field is eventually used to deform original patient data for the final alignment. The method was applied to the Multimodal Brain Tumor Segmentation (BraTS) 2018 challenge database and compared against three existing inpainting methods. The proposed method yielded results with increased peak signal-to-noise ratio, structural similarity index, inception score, and reduced L1 error, leading to successful patient-to-normal brain image registration.

preprint2021arXiv

Two-Dimensional Bipolar Magnetic Semiconductor with High Curie Temperature and Electrically Controllable Spin Polarization Realized in Exfoliated Cr(pyrazine)$_2$ Monolayer

Exploring two-dimensional (2D) magnetic semiconductors with room temperature magnetic ordering and electrically controllable spin polarization is a highly desirable but challenging task for nanospintronics. Here, through first principles calculations, we propose to realize such a material by exfoliating the recently synthesized organometallic layered crystal Li$_{0.7}$[Cr(pyz)$_2$]Cl$_{0.7}$0.25$\cdot$(THF) (pyz = pyrazine, THF = tetrahydrofuran) [Science 370, 587 (2020)]. The feasibility of exfoliation is confirmed by the rather low exfoliation energy of 0.27 J/m$^2$, even smaller than that of graphite. In exfoliated Cr(pyz)$_2$ monolayer, each pyrazine ring grabs one electron from the Cr atom to become a radical anion, then a strong $d$-$p$ direct exchange magnetic interaction emerges between Cr cations and pyrazine radicals, resulting in room temperature ferrimagnetism with a Curie temperature of 342 K. Moreover, Cr(pyz)$_2$ monolayer is revealed to be an intrinsic bipolar magnetic semiconductor where electrical doping can induce half-metallic conduction with controllable spin-polarization direction.

preprint2021arXiv

VoxelHop: Successive Subspace Learning for ALS Disease Classification Using Structural MRI

Deep learning has great potential for accurate detection and classification of diseases with medical imaging data, but the performance is often limited by the number of training datasets and memory requirements. In addition, many deep learning models are considered a "black-box," thereby often limiting their adoption in clinical applications. To address this, we present a successive subspace learning model, termed VoxelHop, for accurate classification of Amyotrophic Lateral Sclerosis (ALS) using T2-weighted structural MRI data. Compared with popular convolutional neural network (CNN) architectures, VoxelHop has modular and transparent structures with fewer parameters without any backpropagation, so it is well-suited to small dataset size and 3D imaging data. Our VoxelHop has four key components, including (1) sequential expansion of near-to-far neighborhood for multi-channel 3D data; (2) subspace approximation for unsupervised dimension reduction; (3) label-assisted regression for supervised dimension reduction; and (4) concatenation of features and classification between controls and patients. Our experimental results demonstrate that our framework using a total of 20 controls and 26 patients achieves an accuracy of 93.48$\%$ and an AUC score of 0.9394 in differentiating patients from controls, even with a relatively small number of datasets, showing its robustness and effectiveness. Our thorough evaluations also show its validity and superiority to the state-of-the-art 3D CNN classification methods. Our framework can easily be generalized to other classification tasks using different imaging modalities.

preprint2020arXiv

Confidence Regularized Self-Training

Recent advances in domain adaptation show that deep self-training presents a powerful means for unsupervised domain adaptation. These methods often involve an iterative process of predicting on target domain and then taking the confident predictions as pseudo-labels for retraining. However, since pseudo-labels can be noisy, self-training can put overconfident label belief on wrong classes, leading to deviated solutions with propagated errors. To address the problem, we propose a confidence regularized self-training (CRST) framework, formulated as regularized self-training. Our method treats pseudo-labels as continuous latent variables jointly optimized via alternating optimization. We propose two types of confidence regularization: label regularization (LR) and model regularization (MR). CRST-LR generates soft pseudo-labels while CRST-MR encourages the smoothness on network output. Extensive experiments on image classification and semantic segmentation show that CRSTs outperform their non-regularized counterpart with state-of-the-art performance. The code and models of this work are available at https://github.com/yzou2/CRST.

preprint2020arXiv

Disentanglement for Discriminative Visual Recognition

Recent successes of deep learning-based recognition rely on maintaining the content related to the main-task label. However, how to explicitly dispel the noisy signals for better generalization in a controllable manner remains an open issue. For instance, various factors such as identity-specific attributes, pose, illumination and expression affect the appearance of face images. Disentangling the identity-specific factors is potentially beneficial for facial expression recognition (FER). This chapter systematically summarize the detrimental factors as task-relevant/irrelevant semantic variations and unspecified latent variation. In this chapter, these problems are casted as either a deep metric learning problem or an adversarial minimax game in the latent space. For the former choice, a generalized adaptive (N+M)-tuplet clusters loss function together with the identity-aware hard-negative mining and online positive mining scheme can be used for identity-invariant FER. The better FER performance can be achieved by combining the deep metric loss and softmax loss in a unified two fully connected layer branches framework via joint optimization. For the latter solution, it is possible to equipping an end-to-end conditional adversarial network with the ability to decompose an input sample into three complementary parts. The discriminative representation inherits the desired invariance property guided by prior knowledge of the task, which is marginal independent to the task-relevant/irrelevant semantic and latent variations. The framework achieves top performance on a serial of tasks, including lighting, makeup, disguise-tolerant face recognition and facial attributes recognition. This chapter systematically summarize the popular and practical solution for disentanglement to achieve more discriminative visual recognition.

preprint2020arXiv

Reinforced Wasserstein Training for Severity-Aware Semantic Segmentation in Autonomous Driving

Semantic segmentation is important for many real-world systems, e.g., autonomous vehicles, which predict the class of each pixel. Recently, deep networks achieved significant progress w.r.t. the mean Intersection-over Union (mIoU) with the cross-entropy loss. However, the cross-entropy loss can essentially ignore the difference of severity for an autonomous car with different wrong prediction mistakes. For example, predicting the car to the road is much more servery than recognize it as the bus. Targeting for this difficulty, we develop a Wasserstein training framework to explore the inter-class correlation by defining its ground metric as misclassification severity. The ground metric of Wasserstein distance can be pre-defined following the experience on a specific task. From the optimization perspective, we further propose to set the ground metric as an increasing function of the pre-defined ground metric. Furthermore, an adaptively learning scheme of the ground matrix is proposed to utilize the high-fidelity CARLA simulator. Specifically, we follow a reinforcement alternative learning scheme. The experiments on both CamVid and Cityscapes datasets evidenced the effectiveness of our Wasserstein loss. The SegNet, ENet, FCN and Deeplab networks can be adapted following a plug-in manner. We achieve significant improvements on the predefined important classes, and much longer continuous playtime in our simulator.

preprint2017arXiv

Exploiting ITO colloidal nanocrystals for ultrafast pulse generation

Dynamical materials that capable of responding to optical stimuli have always been pursued for designing novel photonic devices and functionalities, of which the response speed and amplitude as well as integration adaptability and energy effectiveness are especially critical. Here we show ultrafast pulse generation by exploiting the ultrafast and sensitive nonlinear dynamical processes in tunably solution-processed colloidal epsilon-near-zero (ENZ) transparent conducting oxide (TCO) nanocrystals (NCs), of which the potential respond response speed is >2 THz and modulation depth is ~23% pumped at ~0.7 mJ/cm2, benefiting from the highly confined geometry in addition to the ENZ enhancement effect. These ENZ NCs may offer a scalable and printable material solution for dynamic photonic and optoelectronic devices.