Source author record

Cheng Luo

Cheng Luo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision cond-mat.str-el Artificial Intelligence Computation and Language cond-mat.mtrl-sci Information Retrieval Neurons and Cognition

Catalog footprint

What is connected

14works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Delta Attention Residuals

Attention Residuals replace standard additive residual connections with learned softmax attention over previous layer outputs, enabling selective cross-layer routing. However, standard Attention Residuals still attend over cumulative hidden states in previous layers, which are highly redundant. We show that this redundancy leads to routing collapse in deeper layers: attention weights become low-contrast and closer to uniform (max weight ${\approx}$0.2), limiting the model's ability to select informative states in previous layers. This raises a key but underexplored design question: what layer-wise representations should be routed in Attention Residuals? To answer this question, we propose Delta Attention Residuals, which attend over deltas -- the change introduced by each sublayer ($\mathbf{v}_i = \mathbf{h}_{i+1} - \mathbf{h}_i$) -- instead of cumulative states. Delta representations are structurally diverse and yield higher-contrast attention distributions (max weight ${\approx}$0.6), enabling more selective and effective routing across layers. This principle applies at both per-sublayer and block granularity. Across all tested scales (220M--7.6B), Delta Attention Residuals consistently outperform both standard residuals and Attention Residuals, with 1.7--8.2\% validation perplexity gains. Delta Attention Residuals also enables converting pretrained checkpoints into Delta Attention Residuals via standard fine-tuning. Code is available at https://github.com/wdlctc/delta-attention-residuals-code.

preprint2022arXiv

Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity

Current adversarial attack research reveals the vulnerability of learning-based classifiers against carefully crafted perturbations. However, most existing attack methods have inherent limitations in cross-dataset generalization as they rely on a classification layer with a closed set of categories. Furthermore, the perturbations generated by these methods may appear in regions easily perceptible to the human visual system (HVS). To circumvent the former problem, we propose a novel algorithm that attacks semantic similarity on feature representations. In this way, we are able to fool classifiers without limiting attacks to a specific dataset. For imperceptibility, we introduce the low-frequency constraint to limit perturbations within high-frequency components, ensuring perceptual similarity between adversarial examples and originals. Extensive experiments on three datasets (CIFAR-10, CIFAR-100, and ImageNet-1K) and three public online platforms indicate that our attack can yield misleading and transferable adversarial examples across architectures and datasets. Additionally, visualization results and quantitative performance (in terms of four different metrics) show that the proposed algorithm generates more imperceptible perturbations than the state-of-the-art methods. Code is made available at.

preprint2022arXiv

Learning Multi-dimensional Edge Feature-based AU Relation Graph for Facial Action Unit Recognition

The activations of Facial Action Units (AUs) mutually influence one another. While the relationship between a pair of AUs can be complex and unique, existing approaches fail to specifically and explicitly represent such cues for each pair of AUs in each facial display. This paper proposes an AU relationship modelling approach that deep learns a unique graph to explicitly describe the relationship between each pair of AUs of the target facial display. Our approach first encodes each AU's activation status and its association with other AUs into a node feature. Then, it learns a pair of multi-dimensional edge features to describe multiple task-specific relationship cues between each pair of AUs. During both node and edge feature learning, our approach also considers the influence of the unique facial display on AUs' relationship by taking the full face representation as an input. Experimental results on BP4D and DISFA datasets show that both node and edge feature learning modules provide large performance improvements for CNN and transformer-based backbones, with our best systems achieving the state-of-the-art AU recognition results. Our approach not only has a strong capability in modelling relationship cues for AU recognition but also can be easily incorporated into various backbones. Our PyTorch code is made available.

preprint2022arXiv

Modality-Balanced Embedding for Video Retrieval

Video search has become the main routine for users to discover videos relevant to a text query on large short-video sharing platforms. During training a query-video bi-encoder model using online search logs, we identify a modality bias phenomenon that the video encoder almost entirely relies on text matching, neglecting other modalities of the videos such as vision, audio. This modality imbalanceresults from a) modality gap: the relevance between a query and a video text is much easier to learn as the query is also a piece of text, with the same modality as the video text; b) data bias: most training samples can be solved solely by text matching. Here we share our practices to improve the first retrieval stage including our solution for the modality imbalance issue. We propose MBVR (short for Modality Balanced Video Retrieval) with two key components: manually generated modality-shuffled (MS) samples and a dynamic margin (DM) based on visual relevance. They can encourage the video encoder to pay balanced attentions to each modality. Through extensive experiments on a real world dataset, we show empirically that our method is both effective and efficient in solving modality bias problem. We have also deployed our MBVR in a large video platform and observed statistically significant boost over a highly optimized baseline in an A/B test and manual GSB evaluations.

preprint2022arXiv

PALRACE: Reading Comprehension Dataset with Human Data and Labeled Rationales

Pre-trained language models achieves high performance on machine reading comprehension (MRC) tasks but the results are hard to explain. An appealing approach to make models explainable is to provide rationales for its decision. To investigate whether human rationales can further improve current models and to facilitate supervised learning of human rationales, here we present PALRACE (Pruned And Labeled RACE), a new MRC dataset with human labeled rationales for 800 passages selected from the RACE dataset. We further classified the question to each passage into 6 types. Each passage was read by at least 26 human readers, who labeled their rationales to answer the question. It is demonstrated that models such as RoBERTa-large outperforms human readers in all 6 types of questions, including inference questions, but its performance can be further improved when having access to the human rationales. Simpler models and pre-trained models that are not fine-tuned based on the task benefit more from human rationales, and their performance can be boosted by more than 30% by rationales. With access to human rationales, a simple model based on the GloVe word embedding can reach the performance of BERT-base.

preprint2021arXiv

Minority Oversampling for Imbalanced Time Series Classification

Many important real-world applications involve time-series data with skewed distribution. Compared to conventional imbalance learning problems, the classification of imbalanced time-series data is more challenging due to high dimensionality and high inter-variable correlation. This paper proposes a structure preserving Oversampling method to combat the High-dimensional Imbalanced Time-series classification (OHIT). OHIT first leverages a density-ratio based shared nearest neighbor clustering algorithm to capture the modes of minority class in high-dimensional space. It then for each mode applies the shrinkage technique of large-dimensional covariance matrix to obtain accurate and reliable covariance structure. Finally, OHIT generates the structure-preserving synthetic samples based on multivariate Gaussian distribution by using the estimated covariance matrices. Experimental results on several publicly available time-series datasets (including unimodal and multimodal) demonstrate the superiority of OHIT against the state-of-the-art oversampling algorithms in terms of F1, G-mean, and AUC.

preprint2020arXiv

Let's be Humorous: Knowledge Enhanced Humor Generation

The generation of humor is an under-explored and challenging problem. Previous works mainly utilize templates or replace phrases to generate humor. However, few works focus on freer forms and the background knowledge of humor. The linguistic theory of humor defines the structure of a humor sentence as set-up and punchline. In this paper, we explore how to generate a punchline given the set-up with the relevant knowledge. We propose a framework that can fuse the knowledge to end-to-end models. To our knowledge, this is the first attempt to generate punchlines with knowledge enhanced model. Furthermore, we create the first humor-knowledge dataset. The experimental results demonstrate that our method can make use of knowledge to generate fluent, funny punchlines, which outperforms several baselines.

preprint2020arXiv

Torque equilibrium spin wave theory study of anisotropy and Dzyaloshinskii-Moriya interaction effects on the indirect K$-$ edge RIXS spectrum of a triangular lattice antiferromagnet

We apply the recently formulated torque equilibrium spin wave theory (TESWT) to compute the $1/S$-order interacting $K$ -edge bimagnon resonant inelastic x-ray scattering (RIXS) spectra of an anisotropic triangular lattice antiferromagnet with Dzyaloshinskii-Moriya (DM) interaction. We extend the interacting torque equilibrium formalism, incorporating the effects of DM interaction, to appropriately account for the zero-point quantum fluctuation that manifests as the emergence of spin Casimir effect in a noncollinear spin spiral state. Using inelastic neutron scattering data from Cs$_2$CuCl$_4$ we fit the 1/S corrected TESWT dispersion to extract exchange and DM interaction parameters. We use these new fit coefficients alongside other relevant model parameters to investigate, compare, and contrast the effects of spatial anisotropy and DM interaction on the RIXS spectra at various points across the magnetic Brillouin zone. We highlight the key features of the bi- and trimagnon RIXS spectrum at the two inequivalent rotonlike points, $M(0,2 π/\sqrt{3})$ and $M^{\prime}(π,π/\sqrt{3})$, whose behavior is quite different from an isotropic triangular lattice system. While the roton RIXS spectrum at the $M$ point undergoes a spectral downshift with increasing anisotropy, the peak at the $M^\prime$ location loses its spectral strength without any shift. With the inclusion of DM interaction the spiral phase is more stable and the peak at both $M$ and $M^\prime$ point exhibits a spectral upshift. Our calculation offers a practical example of how to calculate interacting RIXS spectra in a non-collinear quantum magnet using TESWT. Our findings provide an opportunity to experimentally test the predictions of interacting TESWT formalism using RIXS, a spectroscopic method currently in vogue.

preprint2016arXiv

Smoothed Hierarchical Dirichlet Process: A Non-Parametric Approach to Constraint Measures

Time-varying mixture densities occur in many scenarios, for example, the distributions of keywords that appear in publications may evolve from year to year, video frame features associated with multiple targets may evolve in a sequence. Any models that realistically cater to this phenomenon must exhibit two important properties: the underlying mixture densities must have an unknown number of mixtures, and there must be some "smoothness" constraints in place for the adjacent mixture densities. The traditional Hierarchical Dirichlet Process (HDP) may be suited to the first property, but certainly not the second. This is due to how each random measure in the lower hierarchies is sampled independent of each other and hence does not facilitate any temporal correlations. To overcome such shortcomings, we proposed a new Smoothed Hierarchical Dirichlet Process (sHDP). The key novelty of this model is that we place a temporal constraint amongst the nearby discrete measures $\{G_j\}$ in the form of symmetric Kullback-Leibler (KL) Divergence with a fixed bound $B$. Although the constraint we place only involves a single scalar value, it nonetheless allows for flexibility in the corresponding successive measures. Remarkably, it also led us to infer the model within the stick-breaking process where the traditional Beta distribution used in stick-breaking is now replaced by a new constraint calculated from $B$. We present the inference algorithm and elaborate on its solutions. Our experiment using NIPS keywords has shown the desirable effect of the model.

preprint2016arXiv

Spin and quadrupolar orders in the spin-1 bilinear-biquadratic model for iron-based superconductors

Motivated by recent experimental and theoretical progresses of the magnetic properties of iron-based superconductors, we provide a comprehensive analysis of the spin-1 bilinear-biquadratic (BBQ) model on the square lattice. Using variational approach in the mean-field level, we identify the existence of various magnetic phases, including conventional spin dipolar orderings (ferro- or antiferromagnet), novel quadrupolar (spin nematic) orderings and mixed dipolar-quadrupolar orderings. In contrast to the usual Heisenberg model, the elementary excitations of the spin-1 BBQ model are described by the flavor-wave theory within the SU(3) representation. By fitting the experimental spin-wave dispersion, we determine the refined exchange couplings corresponding to the collinear antiferromagnetic iron pnictides. We also present the dynamic structure factors of both spin dipolar and quadrupolar components with connections to the future experiments.

preprint2016arXiv

The Dependent Random Measures with Independent Increments in Mixture Models

When observations are organized into groups where commonalties exist amongst them, the dependent random measures can be an ideal choice for modeling. One of the propositions of the dependent random measures is that the atoms of the posterior distribution are shared amongst groups, and hence groups can borrow information from each other. When normalized dependent random measures prior with independent increments are applied, we can derive appropriate exchangeable probability partition function (EPPF), and subsequently also deduce its inference algorithm given any mixture model likelihood. We provide all necessary derivation and solution to this framework. For demonstration, we used mixture of Gaussians likelihood in combination with a dependent structure constructed by linear combinations of CRMs. Our experiments show superior performance when using this framework, where the inferred values including the mixing weights and the number of clusters both respond appropriately to the number of completely random measure used.

preprint2015arXiv

Signatures of indirect K-edge resonant inelastic x-ray scattering on magnetic excitations in triangular lattice antiferromagnet

We compute the K-edge indirect resonant inelastic x-ray scattering (RIXS) spectrum of a triangular lattice antiferromagnet in its ordered coplanar 3- sublattice 120 degree magnetic state. By considering the first order self$-$energy corrections to the spin wave spectrum, magnon decay rate, bimagnon interactions within the ladder approximation Bethe-Salpeter scheme, and the effect of three-magnon contributions up to 1/S- order we find that the RIXS spectra is non-trivially affected. For a purely isotropic triangular lattice model, the peak splitting mechanism and the appearance of a multipeak RIXS structure is primarily dictated by the damping of magnon modes. At a scattering wavevector corresponding to the zone center Γpoint and at the roton point q=M, where the magnon decay rate is zero, a stable single peak forms. At the $Γ$ point, the contribution is purely trimagnon at the 1/S level and occurs approximately at the trimagnon energy of 6JS. The roton peak occurs at a lower energy of 4JS. The K-edge single peak RIXS spectra at the roton momentum can be utilized as an experimental signature to detect the presence of roton excitations. A unique feature of the triangular lattice K-edge RIXS spectra is the nonvanishing RIXS intensity at both the zone center $Γ$ point and the antiferromagnetic wavevector K point. This result is in sharp contrast to the vanishing K-edge RIXS intensity of the collinear magnetic phases on the square lattice. We find that including XXZ anisotropy leads to additional peak splitting, including at the roton scattering wavevector where the single peak destabilizes towards a two-peak structure. The observed splitting is consistent with our earlier theoretical prediction of the effects of spatial anisotropy on the RIXS spectra of a frustrated quantum magnet [Luo, Datta, and Yao, Phys. Rev. B 89, 165103 (2014)].

preprint2014arXiv

Bidirectional Control of Absence Seizures by the Basal Ganglia: A Computational Evidence

Absence epilepsy is believed to be associated with the abnormal interactions between the cerebral cortex and thalamus. Besides the direct coupling, anatomical evidence indicates that the cerebral cortex and thalamus also communicate indirectly through an important intermediate bridge--basal ganglia. It has been thus postulated that the basal ganglia might play key roles in the modulation of absence seizures, but the relevant biophysical mechanisms are still not completely established. Using a biophysically based model, we demonstrate here that the typical absence seizure activities can be controlled and modulated by the direct GABAergic projections from the substantia nigra pars reticulata (SNr) to either the thalamic reticular nucleus (TRN) or the specific relay nuclei (SRN) of thalamus, through different biophysical mechanisms. Under certain conditions, these two types of seizure control are observed to coexist in the same network. More importantly, due to the competition between the inhibitory SNr-TRN and SNr-SRN pathways, we find that both decreasing and increasing the activation of SNr neurons from the normal level may considerably suppress the generation of SWDs in the coexistence region. Overall, these results highlight the bidirectional functional roles of basal ganglia in controlling and modulating absence seizures, and might provide novel insights into the therapeutic treatments of this brain disorder.

preprint2014arXiv

Spectrum splitting of bimagnon excitations in a spatially frustrated Heisenberg antiferromagnet revealed by resonant inelastic x-ray scattering

We perform a comprehensive analysis of the bimagnon resonant inelastic x-ray scattering (RIXS) intensity spectra of the spatially frustrated Jx-Jy-J2 Heisenberg model on a square lattice in both the antiferromagnetic and the collinear antiferromagnetic phase. We study the model for strong frustration and significant spatial anisotropy to highlight the key signatures of RIXS spectrum splitting which may be experimentally discernible. Based on an interacting spin wave theory study within the ladder approximation Bethe-Salpeter scheme, we find the appearance of a robust two-peak structure over a wide range of the transferred momenta in both magnetically ordered phases. The unfrustrated model has a single-peak structure with a two-peak splitting originating due to spatial anisotropy and frustrated interactions. Our predicted two-peak structure from both magnetically ordered regime can be realized in iron pnictides.

Cheng Luo

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Delta Attention Residuals

Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity

Learning Multi-dimensional Edge Feature-based AU Relation Graph for Facial Action Unit Recognition

Modality-Balanced Embedding for Video Retrieval

PALRACE: Reading Comprehension Dataset with Human Data and Labeled Rationales

Minority Oversampling for Imbalanced Time Series Classification

Let's be Humorous: Knowledge Enhanced Humor Generation

Torque equilibrium spin wave theory study of anisotropy and Dzyaloshinskii-Moriya interaction effects on the indirect K$-$ edge RIXS spectrum of a triangular lattice antiferromagnet

Smoothed Hierarchical Dirichlet Process: A Non-Parametric Approach to Constraint Measures

Spin and quadrupolar orders in the spin-1 bilinear-biquadratic model for iron-based superconductors

The Dependent Random Measures with Independent Increments in Mixture Models

Signatures of indirect K-edge resonant inelastic x-ray scattering on magnetic excitations in triangular lattice antiferromagnet

Bidirectional Control of Absence Seizures by the Basal Ganglia: A Computational Evidence

Spectrum splitting of bimagnon excitations in a spatially frustrated Heisenberg antiferromagnet revealed by resonant inelastic x-ray scattering