Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
31works
0followers
18topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

31 published item(s)

preprint2026arXiv

A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction

This paper presents a unified spoken language model for emotional intelligence, enhanced by a novel data construction strategy termed Injected Emotional-Attribution Thinking (IEAT). IEAT incorporates user emotional states and their underlying causes into the model's internal reasoning process, enabling emotion-aware reasoning to be internalized rather than treated as explicit supervision. The model is trained with a two-stage progressive strategy. The first stage performs speech-text alignment and emotional attribute modeling via self-distillation, while the second stage conducts end-to-end cross-modal joint optimization to ensure consistency between textual and spoken emotional expressions. Experiments on the Human-like Spoken Dialogue Systems Challenge (HumDial) Emotional Intelligence benchmark demonstrate that the proposed approach achieves top-ranked performance across emotional trajectory modeling, emotional reasoning, and empathetic response generation under both LLM-based and human evaluations.

preprint2026arXiv

TELEVAL: A Dynamic Benchmark Designed for Spoken Language Models in Chinese Interactive Scenarios

Spoken language models (SLMs) have advanced rapidly in recent years, accompanied by a growing number of evaluation benchmarks. However, most existing benchmarks emphasize task completion and capability scaling, while remaining poorly aligned with how users interact with SLMs in real-world spoken conversations. Effective spoken interaction requires not only accurate understanding of user intent and content, but also the ability to respond with appropriate interactional strategies. In this paper, we present TELEVAL, a dynamic, user-centered benchmark for evaluating SLMs in realistic Chinese spoken interaction scenarios. TELEVAL consolidates evaluation into two core aspects. Reliable Content Fulfillment assesses whether models can comprehend spoken inputs and produce semantically correct responses. Interactional Appropriateness evaluates whether models act as socially capable interlocutors, requiring them not only to generate human-like, colloquial responses, but also to implicitly incorporate paralinguistic cues for natural interaction. Experiments reveal that, despite strong performance on semantic and knowledge-oriented tasks, current SLMs still struggle to produce natural and interactionally appropriate responses, highlighting the need for more interaction-faithful evaluation.

preprint2024arXiv

Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation

Vision-Language Pre-training has demonstrated its remarkable zero-shot recognition ability and potential to learn generalizable visual representations from language supervision. Taking a step ahead, language-supervised semantic segmentation enables spatial localization of textual inputs by learning pixel grouping solely from image-text pairs. Nevertheless, the state-of-the-art suffers from clear semantic gaps between visual and textual modality: plenty of visual concepts appeared in images are missing in their paired captions. Such semantic misalignment circulates in pre-training, leading to inferior zero-shot performance in dense predictions due to insufficient visual concepts captured in textual representations. To close such semantic gap, we propose Concept Curation (CoCu), a pipeline that leverages CLIP to compensate for the missing semantics. For each image-text pair, we establish a concept archive that maintains potential visually-matched concepts with our proposed vision-driven expansion and text-to-vision-guided ranking. Relevant concepts can thus be identified via cluster-guided sampling and fed into pre-training, thereby bridging the gap between visual and textual semantics. Extensive experiments over a broad suite of 8 segmentation benchmarks show that CoCu achieves superb zero-shot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin, suggesting the value of bridging semantic gap in pre-training data.

preprint2023arXiv

InfoFair: Information-Theoretic Intersectional Fairness

Algorithmic fairness is becoming increasingly important in data mining and machine learning. Among others, a foundational notation is group fairness. The vast majority of the existing works on group fairness, with a few exceptions, primarily focus on debiasing with respect to a single sensitive attribute, despite the fact that the co-existence of multiple sensitive attributes (e.g., gender, race, marital status, etc.) in the real-world is commonplace. As such, methods that can ensure a fair learning outcome with respect to all sensitive attributes of concern simultaneously need to be developed. In this paper, we study the problem of information-theoretic intersectional fairness (InfoFair), where statistical parity, a representative group fairness measure, is guaranteed among demographic groups formed by multiple sensitive attributes of interest. We formulate it as a mutual information minimization problem and propose a generic end-to-end algorithmic framework to solve it. The key idea is to leverage a variational representation of mutual information, which considers the variational distribution between learning outcomes and sensitive attributes, as well as the density ratio between the variational and the original distributions. Our proposed framework is generalizable to many different settings, including other statistical notions of fairness, and could handle any type of learning task equipped with a gradient-based optimizer. Empirical evaluations in the fair classification task on three real-world datasets demonstrate that our proposed framework can effectively debias the classification results with minimal impact to the classification accuracy.

preprint2022arXiv

A Soft-Thresholding Operator for Sparse Time-Varying Effects in Survival Models

We consider a class of Cox models with time-dependent effects that may be zero over certain unknown time regions or, in short, sparse time-varying effects. The model is particularly useful for biomedical studies as it conveniently depicts the gradual evolution of effects of risk factors on survival. Statistically, estimating and drawing inference on infinite dimensional functional parameters with sparsity (e.g., time-varying effects with zero-effect time intervals) present enormous challenges. To address them, we propose a new soft-thresholding operator for modeling sparse, piecewise smooth and continuous time-varying coefficients in a Cox time-varying effects model. Unlike the common regularized methods, our approach enables one to estimate non-zero time-varying effects and detect zero regions simultaneously, and construct a new type of sparse confidence intervals that accommodate zero regions. This leads to a more interpretable model with a straightforward inference procedure. We develop an efficient algorithm for inference in the target functional space, show that the proposed method enjoys desired theoretical properties, and present its finite sample performance by way of simulations. We apply the proposed method to analyze the data of the Boston Lung Cancer Survivor Cohort, an epidemiological cohort study investigating the impacts of risk factors on lung cancer survival, and obtain clinically useful results.

preprint2022arXiv

Adversarial Sample Detection for Speaker Verification by Neural Vocoders

Automatic speaker verification (ASV), one of the most important technology for biometric identification, has been widely adopted in security-critical applications. However, ASV is seriously vulnerable to recently emerged adversarial attacks, yet effective countermeasures against them are limited. In this paper, we adopt neural vocoders to spot adversarial samples for ASV. We use the neural vocoder to re-synthesize audio and find that the difference between the ASV scores for the original and re-synthesized audio is a good indicator for discrimination between genuine and adversarial samples. This effort is, to the best of our knowledge, among the first to pursue such a technical direction for detecting time-domain adversarial samples for ASV, and hence there is a lack of established baselines for comparison. Consequently, we implement the Griffin-Lim algorithm as the detection baseline. The proposed approach achieves effective detection performance that outperforms the baselines in all the settings. We also show that the neural vocoder adopted in the detection framework is dataset-independent. Our codes will be made open-source for future works to do fair comparison.

preprint2022arXiv

Bayesian learning of COVID-19 Vaccine safety while incorporating Adverse Events ontology

While vaccines are crucial to end the COVID-19 pandemic, public confidence in vaccine safety has always been vulnerable. Many statistical methods have been applied to VAERS (Vaccine Adverse Event Reporting System) database to study the safety of COVID-19 vaccines. However, all these methods ignored the adverse event (AE) ontology. AEs are naturally related; for example, events of retching, dysphagia, and reflux are all related to an abnormal digestive system. Explicitly bringing AE relationships into the model can aid in the detection of true AE signals amid the noise while reducing false positives. We propose a Bayesian graphical model to estimate all AEs while incorporating the AE ontology simultaneously. We proposed strategies to construct conjugate forms leading to an efficient Gibbs sampler. Built upon the posterior distributions, we proposed a negative control approach to mitigate reporting bias and an enrichment approach to detect AE groups of concern. The proposed methods were evaluated using simulation studies and were further illustrated on studying the safety of COVID-19 vaccines. The proposed methods were implemented in R package \textit{BGrass} and source code are available at https://github.com/BangyaoZhao/BGrass.

preprint2022arXiv

Deep Historical Borrowing Framework to Prospectively and Simultaneously Synthesize Control Information in Confirmatory Clinical Trials with Multiple Endpoints

In current clinical trial development, historical information is receiving more attention as it provides utility beyond sample size calculation. Meta-analytic-predictive (MAP) priors and robust MAP priors have been proposed for prospectively borrowing historical data on a single endpoint. To simultaneously synthesize control information from multiple endpoints in confirmatory clinical trials, we propose to approximate posterior probabilities from a Bayesian hierarchical model and estimate critical values by deep learning to construct pre-specified strategies for hypothesis testing. This feature is important to ensure study integrity by establishing prospective decision functions before the trial conduct. Simulations are performed to show that our method properly controls family-wise error rate (FWER) and preserves power as compared with a typical practice of choosing constant critical values given a subset of null space. Satisfactory performance under prior-data conflict is also demonstrated. We further illustrate our method using a case study in Immunology.

preprint2022arXiv

Finite-Sample Two-Group Composite Hypothesis Testing via Machine Learning

In the problem of composite hypothesis testing, identifying the potential uniformly most powerful (UMP) unbiased test is of great interest. Beyond typical hypothesis settings with exponential family, it is usually challenging to prove the existence and further construct such UMP unbiased tests with finite sample size. For example in the COVID-19 pandemic with limited previous assumptions on the treatment for investigation and the standard of care, adaptive clinical trials are appealing due to ethical considerations, and the ability to accommodate uncertainty while conducting the trial. Although several methods have been proposed to control type I error rates, how to find a more powerful hypothesis testing strategy is still an open question. Motivated by this problem, we propose an automatic framework of constructing test statistics and corresponding critical values via machine learning methods to enhance power in a finite sample. In this article, we particularly illustrate the performance using Deep Neural Networks (DNN) and discuss its advantages. Simulations and two case studies of adaptive designs demonstrate that our method is automatic, general and pre-specified to construct statistics with satisfactory power in finite-sample. Supplemental materials are available online including R code and an R shiny app.

preprint2022arXiv

Image Response Regression via Deep Neural Networks

Delineating the associations between images and a vector of covariates is of central interest in medical imaging studies. To tackle this problem of image response regression, we propose a novel nonparametric approach in the framework of spatially varying coefficient models, where the spatially varying functions are estimated through deep neural networks. Compared to existing solutions, the proposed method explicitly accounts for spatial smoothness and subject heterogeneity, has straightforward interpretations, and is highly flexible and accurate in capturing complex association patterns. A key idea in our approach is to treat the image voxels as the effective samples, which not only alleviates the limited sample size issue that haunts the majority of medical imaging studies, but also leads to more robust and reproducible results. Focusing on a broad family of piecewise smooth functions, we establish the estimation and selection consistency, and derive the asymptotic error bounds. We demonstrate the efficacy of the method through intensive simulations, and further illustrate its advantages with analyses of two functional magnetic resonance imaging datasets.

preprint2022arXiv

Individualized Risk Assessment of Preoperative Opioid Use by Interpretable Neural Network Regression

Preoperative opioid use has been reported to be associated with higher preoperative opioid demand, worse postoperative outcomes, and increased postoperative healthcare utilization and expenditures. Understanding the risk of preoperative opioid use helps establish patient-centered pain management. In the field of machine learning, deep neural network (DNN) has emerged as a powerful means for risk assessment because of its superb prediction power; however, the blackbox algorithms may make the results less interpretable than statistical models. Bridging the gap between the statistical and machine learning fields, we propose a novel Interpretable Neural Network Regression (INNER), which combines the strengths of statistical and DNN models. We use the proposed INNER to conduct individualized risk assessment of preoperative opioid use. Intensive simulations and an analysis of 34,186 patients expecting surgery in the Analgesic Outcomes Study (AOS) show that the proposed INNER not only can accurately predict the preoperative opioid use using preoperative characteristics as DNN, but also can estimate the patient specific odds of opioid use without pain and the odds ratio of opioid use for a unit increase in the reported overall body pain, leading to more straightforward interpretations of the tendency to use opioids than DNN. Our results identify the patient characteristics that are strongly associated with opioid use and is largely consistent with the previous findings, providing evidence that INNER is a useful tool for individualized risk assessment of preoperative opioid use.

preprint2022arXiv

Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition

In Uyghur speech, consonant and vowel reduction are often encountered, especially in spontaneous speech with high speech rate, which will cause a degradation of speech recognition performance. To solve this problem, we propose an effective phone mask training method for Conformer-based Uyghur end-to-end (E2E) speech recognition. The idea is to randomly mask off a certain percentage features of phones during model training, which simulates the above verbal phenomena and facilitates E2E model to learn more contextual information. According to experiments, the above issues can be greatly alleviated. In addition, deep investigations are carried out into different units in masking, which shows the effectiveness of our proposed masking unit. We also further study the masking method and optimize filling strategy of phone mask. Finally, compared with Conformer-based E2E baseline without mask training, our model demonstrates about 5.51% relative Word Error Rate (WER) reduction on reading speech and 12.92% on spontaneous speech, respectively. The above approach has also been verified on test-set of open-source data THUYG-20, which shows 20% relative improvements.

preprint2022arXiv

Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition

General accent recognition (AR) models tend to directly extract low-level information from spectrums, which always significantly overfit on speakers or channels. Considering accent can be regarded as a series of shifts relative to native pronunciation, distinguishing accents will be an easier task with accent shift as input. But due to the lack of native utterance as an anchor, estimating the accent shift is difficult. In this paper, we propose linguistic-acoustic similarity based accent shift (LASAS) for AR tasks. For an accent speech utterance, after mapping the corresponding text vector to multiple accent-associated spaces as anchors, its accent shift could be estimated by the similarities between the acoustic embedding and those anchors. Then, we concatenate the accent shift with a dimension-reduced text vector to obtain a linguistic-acoustic bimodal representation. Compared with pure acoustic embedding, the bimodal representation is richer and more clear by taking full advantage of both linguistic and acoustic information, which can effectively improve AR performance. Experiments on Accented English Speech Recognition Challenge (AESRC) dataset show that our method achieves 77.42% accuracy on Test set, obtaining a 6.94% relative improvement over a competitive system in the challenge.

preprint2022arXiv

M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval

Videos contain multi-modal content, and exploring multi-level cross-modal interactions with natural language queries can provide great prominence to text-video retrieval task (TVR). However, new trending methods applying large-scale pre-trained model CLIP for TVR do not focus on multi-modal cues in videos. Furthermore, the traditional methods simply concatenating multi-modal features do not exploit fine-grained cross-modal information in videos. In this paper, we propose a multi-level multi-modal hybrid fusion (M2HF) network to explore comprehensive interactions between text queries and each modality content in videos. Specifically, M2HF first utilizes visual features extracted by CLIP to early fuse with audio and motion features extracted from videos, obtaining audio-visual fusion features and motion-visual fusion features respectively. Multi-modal alignment problem is also considered in this process. Then, visual features, audio-visual fusion features, motion-visual fusion features, and texts extracted from videos establish cross-modal relationships with caption queries in a multi-level way. Finally, the retrieval outputs from all levels are late fused to obtain final text-video retrieval results. Our framework provides two kinds of training strategies, including an ensemble manner and an end-to-end manner. Moreover, a novel multi-modal balance loss function is proposed to balance the contributions of each modality for efficient end-to-end training. M2HF allows us to obtain state-of-the-art results on various benchmarks, eg, Rank@1 of 64.9\%, 68.2\%, 33.2\%, 57.1\%, 57.8\% on MSR-VTT, MSVD, LSMDC, DiDeMo, and ActivityNet, respectively.

preprint2022arXiv

Optimizing Graphical Procedures for Multiplicity Control in a Confirmatory Clinical Trial via Deep Learning

In confirmatory clinical trials, it has been proposed to use a simple iterative graphical approach to construct and perform intersection hypotheses tests with a weighted Bonferroni-type procedure to control type I errors in the strong sense. Given Phase II study results or other prior knowledge, it is usually of main interest to find the optimal graph that maximizes a certain objective function in a future Phase III study. In this article, we evaluate the performance of two existing derivative-free constrained methods, and further propose a deep learning enhanced optimization framework. Our method numerically approximates the objective function via feedforward neural networks (FNNs) and then performs optimization with available gradient information. It can be constrained so that some features of the testing procedure are held fixed while optimizing over other features. Simulation studies show that our FNN-based approach has a better balance between robustness and time efficiency than some existing derivative-free constrained optimization algorithms. Compared to the traditional stochastic search method, our optimizer has moderate multiplicity adjusted power gain when the number of hypotheses is relatively large. We further apply it to a case study to illustrate how to optimize a multiple testing procedure with respect to a specific study objective.

preprint2022arXiv

RawlsGCN: Towards Rawlsian Difference Principle on Graph Convolutional Network

Graph Convolutional Network (GCN) plays pivotal roles in many real-world applications. Despite the successes of GCN deployment, GCN often exhibits performance disparity with respect to node degrees, resulting in worse predictive accuracy for low-degree nodes. We formulate the problem of mitigating the degree-related performance disparity in GCN from the perspective of the Rawlsian difference principle, which is originated from the theory of distributive justice. Mathematically, we aim to balance the utility between low-degree nodes and high-degree nodes while minimizing the task-specific loss. Specifically, we reveal the root cause of this degree-related unfairness by analyzing the gradients of weight matrices in GCN. Guided by the gradients of weight matrices, we further propose a pre-processing method RawlsGCN-Graph and an in-processing method RawlsGCN-Grad that achieves fair predictive accuracy in low-degree nodes without modification on the GCN architecture or introduction of additional parameters. Extensive experiments on real-world graphs demonstrate the effectiveness of our proposed RawlsGCN methods in significantly reducing degree-related bias while retaining comparable overall performance.

preprint2022arXiv

Understanding the dynamic impact of COVID-19 through competing risk modeling with bivariate varying coefficients

The coronavirus disease 2019 (COVID-19) pandemic has exerted a profound impact on patients with end-stage renal disease relying on kidney dialysis to sustain their lives. Motivated by a request by the U.S. Centers for Medicare & Medicaid Services, our analysis of their postdischarge hospital readmissions and deaths in 2020 revealed that the COVID-19 effect has varied significantly with postdischarge time and time since the onset of the pandemic. However, the complex dynamics of the COVID-19 effect trajectories cannot be characterized by existing varying coefficient models. To address this issue, we propose a bivariate varying coefficient model for competing risks within a cause-specific hazard framework, where tensor-product B-splines are used to estimate the surface of the COVID-19 effect. An efficient proximal Newton algorithm is developed to facilitate the fitting of the new model to the massive Medicare data for dialysis patients. Difference-based anisotropic penalization is introduced to mitigate model overfitting and the wiggliness of the estimated trajectories; various cross-validation methods are considered in the determination of optimal tuning parameters. Hypothesis testing procedures are designed to examine whether the COVID-19 effect varies significantly with postdischarge time and the time since pandemic onset, either jointly or separately. Simulation experiments are conducted to evaluate the estimation accuracy, type I error rate, statistical power, and model selection procedures. Applications to Medicare dialysis patients demonstrate the real-world performance of the proposed methods.

preprint2021arXiv

Cascades between light and heavy fermions in the normal state of magic angle twisted bilayer graphene

We present a framework for understanding the recently observed cascade transitions and the Landau level degeneracies at every integer filling of twisted bilayer graphene. The Coulomb interaction projected onto narrow bands causes the charged excitations at an integer filling to disperse, forming new bands. If the excitation moves the filling away from the charge neutrality point, then it has a band minimum at the moire Brillouin zone center with a small mass that compares well with the experiment; if towards the charge neutrality point, then it has a much larger mass and a higher degeneracy. At a non-zero density away from an integer filling the excitations interact. The system on the small mass side has a large bandwidth and forms a Fermi liquid. On the large mass side the bandwidth is narrow, the compressibility is negative and the Fermi liquid is likely unstable. This explains the observed sawtooth features in compressibility, the Landau fans pointing away from charge neutrality as well as their degeneracies. By providing a description of the charge itineracy in the normal state this framework sets the stage for superconductivity at lower temperatures.

preprint2021arXiv

Correlated Insulating Phases in the Twisted Bilayer Graphene

We review analytical and numerical studies of correlated insulating states in twisted bilayer graphene, focusing on real-space lattice models constructions and their unbiased quantum many-body solutions. We show that by constructing localized Wannier states for the narrow bands, the projected Coulomb interactions can be approximated by interactions of cluster charges with assisted nearest neighbor hopping terms. With the interaction part only, the Hamiltonian is $SU(4)$ symmetric considering both spin and valley degrees of freedom. In the strong coupling limit where the kinetic terms are neglected, the ground states are found to be in the $SU(4)$ manifold with degeneracy. The kinetic terms, treated as perturbation, break this large $SU(4)$ symmetry and propel the appearance of intervalley coherent state, quantum topological insulators and other symmetry-breaking insulating states. We first present the theoretical analysis of moiré lattice model construction and then show how to solve the model with large-scale quantum Monte Carlo simulations in an unbiased manner. We further provide potential directions such that from the real-space model construction and its quantum many-body solutions how the perplexing yet exciting experimental discoveries in the correlation physics of twisted bilayer graphene can be gradually understood. This review will be helpful for the readers to grasp the fast growing field of the model study of twisted bilayer graphene.

preprint2021arXiv

Correlation-induced insulating topological phases at charge neutrality in twisted bilayer graphene

Twisted bilayer graphene (TBG) provides a unique framework to elucidate the interplay between strong correlations and topological phenomena in two-dimensional systems. The existence of multiple electronic degrees of freedom -- charge, spin, and valley -- gives rise to a plethora of possible ordered states and instabilities. Identifying which of them are realized in the regime of strong correlations is fundamental to shed light on the nature of the superconducting and correlated insulating states observed in the TBG experiments. Here, we use unbiased, sign-problem-free quantum Monte Carlo simulations to solve an effective interacting lattice model for TBG at charge neutrality. Besides the usual cluster Hubbard-like repulsion, this model also contains an assisted hopping interaction that emerges due to the non-trivial topological properties of TBG. Such a non-local interaction fundamentally alters the phase diagram at charge neutrality, gapping the Dirac cones even for infinitesimally small interaction. As the interaction strength increases, a sequence of different correlated insulating phases emerge, including a quantum valley Hall state with topological edge states, an intervalley-coherent insulator, and a valence bond solid. The charge-neutrality correlated insulating phases discovered here provide the sought-after reference states needed for a comprehensive understanding of the insulating states at integer fillings and the proximate superconducting states of TBG.

preprint2021arXiv

Topological and nematic superconductivity mediated by ferro-SU(4) fluctuations in twisted bilayer graphene

We propose an SU(4) spin-valley-fermion model to investigate the superconducting instabilities of twisted bilayer graphene (TBG). In this approach, bosonic fluctuations associated with an emergent SU(4) symmetry, corresponding to combined rotations in valley and spin spaces, couple to the low-energy fermions that comprise the flat bands. These fluctuations are peaked at zero wave-vector, reflecting the "ferromagnetic-like" SU(4) ground state recently found in strong-coupling solutions of microscopic models for TBG. Focusing on electronic states related to symmetry-imposed points of the Fermi surface, dubbed here "valley hot-spots" and "van Hove hot-spots", we find that the coupling to the itinerant electrons partially lifts the huge degeneracy of the ferro-SU(4) ground state manifold, favoring inter-valley order, spin-valley coupled order, ferromagnetic order, spin-current order, and valley-polarized order, depending on details of the band structure. These fluctuations, in turn, promote attractive pairing interactions in a variety of closely competing channels, including a nodeless $f$-wave state, a nodal $i$-wave state, and topological $d+id$ and $p+ip$ states with unusual Chern numbers $2$ and $4$, respectively. Nematic superconductivity, although not realized as a primary instability of the system, still appears as a consequence of the near-degeneracy of superconducting order parameters that transform as one-dimensional and two-dimensional irreducible representations of the point group $D_{6}$.

preprint2020arXiv

Bayesian Sparse Mediation Analysis with Targeted Penalization of Natural Indirect Effects

Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional mediators while directly target penalization of the natural indirect effect (NIE) for active mediator identification. Here, we develop two novel prior models for identification of active mediators in high-dimensional mediation analysis through penalizing NIEs in a Bayesian paradigm. Both methods specify a joint prior distribution on the exposure-mediator effect and mediator-outcome effect with either (a) a four-component Gaussian mixture prior or (b) a product threshold Gaussian prior. By jointly modeling the two parameters that contribute to the NIE, the proposed methods enable penalization on their product in a targeted way. Resultant inference can take into account the four-component composite structure underlying the NIE. We show through simulations that the proposed methods improve both selection and estimation accuracy compared to other competing methods. We applied our methods for an in-depth analysis of two ongoing epidemiologic studies: the Multi-Ethnic Study of Atherosclerosis (MESA) and the LIFECODES birth cohort. The identified active mediators in both studies reveal important biological pathways for understanding disease mechanisms.

preprint2020arXiv

Bayesian Symbolic Regression

Interpretability is crucial for machine learning in many scenarios such as quantitative finance, banking, healthcare, etc. Symbolic regression (SR) is a classic interpretable machine learning method by bridging X and Y using mathematical expressions composed of some basic functions. However, the search space of all possible expressions grows exponentially with the length of the expression, making it infeasible for enumeration. Genetic programming (GP) has been traditionally and commonly used in SR to search for the optimal solution, but it suffers from several limitations, e.g. the difficulty in incorporating prior knowledge; overly-complicated output expression and reduced interpretability etc. To address these issues, we propose a new method to fit SR under a Bayesian framework. Firstly, Bayesian model can naturally incorporate prior knowledge (e.g., preference of basis functions, operators and raw features) to improve the efficiency of fitting SR. Secondly, to improve interpretability of expressions in SR, we aim to capture concise but informative signals. To this end, we assume the expected signal has an additive structure, i.e., a linear combination of several concise expressions, whose complexity is controlled by a well-designed prior distribution. In our setup, each expression is characterized by a symbolic tree, and the proposed SR model could be solved by sampling symbolic trees from the posterior distribution using an efficient Markov chain Monte Carlo (MCMC) algorithm. Finally, compared with GP, the proposed BSR(Bayesian Symbolic Regression) method saves computer memory with no need to keep an updated 'genome pool'. Numerical experiments show that, compared with GP, the solutions of BSR are closer to the ground truth and the expressions are more concise. Meanwhile we find the solution of BSR is robust to hyper-parameter specifications such as the number of trees.

preprint2020arXiv

Crystalline Nodal Topological Superconductivity and Bogolyubov Fermi Surfaces in Monolayer NbSe$_2$

We present a microscopic calculation of the phase diagram of the Ising superconductor NbSe$_{2}$ in presence of both in-plane magnetic field and Rashba spin-orbit coupling (SOC). Repulsive interactions lead to two distinct instabilities, in singlet- and triplet- interaction channels. While we recover the previously predicted nodal topological superconducting state in the absence of Rashba SOC at large magnetic field with six pairs of nodes along \(Γ\)-\(M\) lines, a finite Rashba SOC breaks the symmetry that protects these nodes and therefore generally lifts them, resulting in a topologically trivial phase. There is an exception when the field is applied along one of the three $Γ$-$K$ lines, however. In that case, a single mirror symmetry remains that can protect two pairs of nodes out of the original six, resulting in a \emph{crystalline} topological superconducting phase. Depending on the Cooper pairs' center-of-mass momentum, this superconducting state displays either Bogolyubov Fermi surfaces or point nodes. Moreover, a chiral topological superconducting phase with Chern number of 6 is realized in the regime of large Rashba SOC and dominant triplet interactions, spontaneously breaking time-reversal symmetry.

preprint2020arXiv

Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

This chapter presents recent advances in content based image search and retrieval (CBIR) systems in remote sensing (RS) for fast and accurate information discovery from massive data archives. Initially, we analyze the limitations of the traditional CBIR systems that rely on the hand-crafted RS image descriptors. Then, we focus our attention on the advances in RS CBIR systems for which deep learning (DL) models are at the forefront. In particular, we present the theoretical properties of the most recent DL based CBIR systems for the characterization of the complex semantic content of RS images. After discussing their strengths and limitations, we present the deep hashing based CBIR systems that have high time-efficient search capability within huge data archives. Finally, the most promising research directions in RS CBIR are discussed.

preprint2020arXiv

Learning Convolutional Sparse Coding on Complex Domain for Interferometric Phase Restoration

Interferometric phase restoration has been investigated for decades and most of the state-of-the-art methods have achieved promising performances for InSAR phase restoration. These methods generally follow the nonlocal filtering processing chain aiming at circumventing the staircase effect and preserving the details of phase variations. In this paper, we propose an alternative approach for InSAR phase restoration, i.e. Complex Convolutional Sparse Coding (ComCSC) and its gradient regularized version. To our best knowledge, this is the first time that we solve the InSAR phase restoration problem in a deconvolutional fashion. The proposed methods can not only suppress interferometric phase noise, but also avoid the staircase effect and preserve the details. Furthermore, they provide an insight of the elementary phase components for the interferometric phases. The experimental results on synthetic and realistic high- and medium-resolution datasets from TerraSAR-X StripMap and Sentinel-1 interferometric wide swath mode, respectively, show that our method outperforms those previous state-of-the-art methods based on nonlocal InSAR filters, particularly the state-of-the-art method: InSAR-BM3D. The source code of this paper will be made publicly available for reproducible research inside the community.

preprint2020arXiv

Learning Shared Cross-modality Representation Using Multispectral-LiDAR and Hyperspectral Data

Due to the ever-growing diversity of the data source, multi-modality feature learning has attracted more and more attention. However, most of these methods are designed by jointly learning feature representation from multi-modalities that exist in both training and test sets, yet they are less investigated in absence of certain modality in the test phase. To this end, in this letter, we propose to learn a shared feature space across multi-modalities in the training process. By this way, the out-of-sample from any of multi-modalities can be directly projected onto the learned space for a more effective cross-modality representation. More significantly, the shared space is regarded as a latent subspace in our proposed method, which connects the original multi-modal samples with label information to further improve the feature discrimination. Experiments are conducted on the multispectral-Lidar and hyperspectral dataset provided by the 2018 IEEE GRSS Data Fusion Contest to demonstrate the effectiveness and superiority of the proposed method in comparison with several popular baselines.

preprint2020arXiv

Multipass SAR Interferometry Based on Total Variation Regularized Robust Low Rank Tensor Decomposition

Multipass SAR interferometry (InSAR) techniques based on meter-resolution spaceborne SAR satellites, such as TerraSAR-X or COSMO-Skymed, provide 3D reconstruction and the measurement of ground displacement over large urban areas. Conventional method such as Persistent Scatterer Interferometry (PSI) usually requires a fairly large SAR image stack (usually in the order of tens), in order to achieve reliable estimates of these parameters. Recently, low rank property in multipass InSAR data stack was explored and investigated in our previous work. By exploiting this low rank prior, more accurate estimation of the geophysical parameters can be achieved, which in turn can effectively reduce the number of interferograms required for a reliable estimation. Based on that, this paper proposes a novel tensor decomposition method in complex domain, which jointly exploits low rank and variational prior of the interferometric phase in InSAR data stacks. Specifically, a total variation (TV) regularized robust low rank tensor decomposition method is exploited for recovering outlier-free InSAR stacks. We demonstrate that the filtered InSAR data stacks can greatly improve the accuracy of geophysical parameters estimated from real data. Moreover, this paper demonstrates for the first time in the community that tensor-decomposition-based methods can be beneficial for large-scale urban mapping problems using multipass InSAR. Two TerraSAR-X data stacks with large spatial areas demonstrate the promising performance of the proposed method.

preprint2020arXiv

Non-Abelian Dirac node braiding and near-degeneracy of correlated phases at odd integer filling in magic angle twisted bilayer graphene

We use the DMRG to study the correlated electron states favored by the Coulomb interaction projected onto the narrow bands of twisted bilayer graphene within a spinless one-valley model. The Hilbert space of the narrow bands is constructed from a pair of hybrid Wannier states with opposite Chern numbers. Depending on the parameters in the BM model, the DMRG in this basis determines the ground state at one particle per unit cell to be either QAH state or a state with no Hall effect which is nearly a product state. Based on this form, we then apply the variational method to study their competition, thus identifying three states: the QAH, a gapless $C_2T$ symmetric nematic, and a gapped $C_2T$ symmetric stripe. All three states are nearly degenerate at the realistic parameters of the BM model. The single particle spectrum of the nematic contains either a quadratic node or two close Dirac nodes near $Γ$. Motivated by the Landau level degeneracy found in this state, we propose it to be the state observed at the charge neutrality point once spin and valley degeneracies are restored. The optimal period for the $C_2T$ stripe state is found to be $2$ unit cells. In addition, using the fact that the topological charge of the nodes in the $C_2T$ nematic phase is no longer described simply by their winding numbers once the translation symmetry is broken, but rather by certain elements of a non-Abelian group that was recently pointed out, we identify the mechanism of the gap opening within the $C_2T$ stripe state. Although the nodes at the Fermi energy are locally stable, they can be annihilated after braiding with other nodes connecting them to adjacent (folded) bands. Therefore, if the translation symmetry is broken, the gap at one particle per unit cell can open even if the system preserves the $C_2T$ and valley $U(1)$ symmetries, and the gap to remote bands remains open.

preprint2020arXiv

Statistical Inference for High-Dimensional Vector Autoregression with Measurement Error

High-dimensional vector autoregression with measurement error is frequently encountered in a large variety of scientific and business applications. In this article, we study statistical inference of the transition matrix under this model. While there has been a large body of literature studying sparse estimation of the transition matrix, there is a paucity of inference solutions, especially in the high-dimensional scenario. We develop inferential procedures for both the global and simultaneous testing of the transition matrix. We first develop a new sparse expectation-maximization algorithm to estimate the model parameters, and carefully characterize their estimation precisions. We then construct a Gaussian matrix, after proper bias and variance corrections, from which we derive the test statistics. Finally, we develop the testing procedures and establish their asymptotic guarantees. We study the finite-sample performance of our tests through intensive simulations, and illustrate with a brain connectivity analysis example.

preprint2019arXiv

Minorization-Maximization-based Steepest Ascent for Large-scale Survival Analysis with Time-Varying Effects: Application to the National Kidney Transplant Dataset

The time-varying effects model is a flexible and powerful tool for modeling the dynamic changes of covariate effects. However, in survival analysis, its computational burden increases quickly as the number of sample sizes or predictors grows. Traditional methods that perform well for moderate sample sizes and low-dimensional data do not scale to massive data. Analysis of national kidney transplant data with a massive sample size and large number of predictors defy any existing statistical methods and software. In view of these difficulties, we propose a Minorization-Maximization-based steepest ascent procedure for estimating the time-varying effects. Leveraging the block structure formed by the basis expansions, the proposed procedure iteratively updates the optimal block-wise direction along which the approximate increase in the log-partial likelihood is maximized. The resulting estimates ensure the ascent property and serve as refinements of the previous step. The performance of the proposed method is examined by simulations and applications to the analysis of national kidney transplant data.