Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
83works
0followers
30topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

83 published item(s)

preprint2026arXiv

Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method

Recent progress in reasoning capabilities of Multimodal Large Language Models(MLLMs) has highlighted their potential for performing complex video understanding tasks. However, in the domain of Video Anomaly Detection and Understanding (VAD&U), existing MLLM-based methods are largely limited to anomaly localization or post-hoc description, lacking explicit reasoning processes, risk awareness, and decision-oriented interpretation. To address this gap, we define a new task termed Video Anomaly Reasoning (VAR), which elevates video anomaly analysis from descriptive understanding to structured, multi-stage reasoning. VAR explicitly requires models to perform progressive reasoning over anomalous events before answering anomaly-related questions, encompassing visual perception, causal interpretation, and risk-aware decision making. To support this task, we present a new dataset with 8,641 videos, where each video is annotated with diverse question types corresponding to different reasoning depths, totaling more than 50,000 samples, making it one of the largest datasets for video anomaly. The annotations are based on a structured Perception-Cognition-Action Chain-of-Thought (PerCoAct-CoT), which formalizes domain-specific reasoning priors for video anomaly understanding. This design enables systematic evaluation of multi-stage and adaptive anomaly reasoning. In addition, we propose Anomaly-Aware Group Relative Policy Optimization to further enhance reasoning reliability under weak supervision. Building upon the proposed task and dataset, we develop an end-to-end MLLM-based VAR model termed Vad-R1-Plus, which supports adaptive hierarchical reasoning and risk-aware decision making. Extensive experiments demonstrate that the proposed benchmark and method effectively advance the reasoning capabilities of MLLMs on VAR tasks, outperforming both open-source and proprietary baselines.

preprint2026arXiv

DeepH-pack: A general-purpose neural network package for deep-learning electronic structure calculations

In computational physics and materials science, first-principles methods, particularly density functional theory, have become central tools for electronic structure prediction and materials design. Recently, rapid advances in artificial intelligence (AI) have begun to reshape the research landscape, giving rise to the emerging field of deep-learning electronic structure calculations. Despite numerous pioneering studies, the field remains in its early stages; existing software implementations are often fragmented, lacking unified frameworks and standardized interfaces required for broad community adoption. Here we present DeepH-pack, a comprehensive and unified software package that integrates first-principles calculations with deep learning. By incorporating fundamental physical principles into neural-network design, such as the nearsightedness principle and the equivariance principle, DeepH-pack achieves robust cross-scale and cross-material generalizability. This allows models trained on small-scale structures to generalize to large-scale and previously unseen materials. The toolkit preserves first-principles accuracy while accelerating electronic structure calculations by several orders of magnitude, establishing an efficient and intelligent computational paradigm for large-scale materials simulation, high-throughput materials database construction, and AI-driven materials discovery.

preprint2026arXiv

Sterile Neutrino Dark Matter as a Probe of Inflationary Reheating

Sterile neutrinos offer a minimal and testable explanation for dark matter (DM), with their radiative decay actively searched for in X-ray observations. We show that cold sterile neutrino DM can be efficiently produced during reheating from inflaton decays with a tiny branching ratio, ${\rm BR}\lesssim 10^{-4}$. This production mechanism opens regions of parameter space where the active-sterile mixing is small enough to evade current X-ray constraints while reproducing the observed DM abundance. We systematically map the viable parameter space in terms of the sterile neutrino mass, mixing angle, inflaton mass, reheating temperature, and branching ratio. We further demonstrate that sterile neutrino DM can serve as a probe of inflationary reheating, with future X-ray observations capable of setting lower bounds on the reheating temperature several orders of magnitude above the existing bound from Big Bang Nucleosynthesis.

preprint2025arXiv

WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables

Wearable devices such as AI glasses are transforming voice assistants into always-available, hands-free collaborators that integrate seamlessly with daily life, but they also introduce challenges like egocentric audio affected by motion and noise, rapid micro-interactions, and the need to distinguish device-directed speech from background conversations. Existing benchmarks largely overlook these complexities, focusing instead on clean or generic conversational audio. To bridge this gap, we present WearVox, the first benchmark designed to rigorously evaluate voice assistants in realistic wearable scenarios. WearVox comprises 3,842 multi-channel, egocentric audio recordings collected via AI glasses across five diverse tasks including Search-Grounded QA, Closed-Book QA, Side-Talk Rejection, Tool Calling, and Speech Translation, spanning a wide range of indoor and outdoor environments and acoustic conditions. Each recording is accompanied by rich metadata, enabling nuanced analysis of model performance under real-world constraints. We benchmark leading proprietary and open-source speech Large Language Models (SLLMs) and find that most real-time SLLMs achieve accuracies on WearVox ranging from 29% to 59%, with substantial performance degradation on noisy outdoor audio, underscoring the difficulty and realism of the benchmark. Additionally, we conduct a case study with two new SLLMs that perform inference with single-channel and multi-channel audio, demonstrating that multi-channel audio inputs significantly enhance model robustness to environmental noise and improve discrimination between device-directed and background speech. Our results highlight the critical importance of spatial audio cues for context-aware voice assistants and establish WearVox as a comprehensive testbed for advancing wearable voice AI research.

preprint2024arXiv

Enhancement of Ising superconductivity in monolayer NbSe$_2$ via surface fluorination

Recently discovered Ising superconductors have garnered considerable interest due to their anomalously large in-plane upper critical fields ($B_{c2}$). However, the requisite strong spin-orbital coupling in the Ising pairing mechanism generally renders these superconductors heavy-element dominant with notably low superconducting transition temperatures ($T_c$). Here, based on the Migdal-Eliashberg theory and the mean-field Bogoliubov-de Gennes Hamiltonian, we demonstrate a significant enhancement of Ising superconductivity in monolayer NbSe$_2$ through surface fluorination, as evidenced by concomitant improvements in $T_c$ and $B_{c2}$. This enhancement arises from three predominant factors. Firstly, fluorine atoms symmetrically and stably adhere to both sides of the monolayer NbSe$_2$, thereby maintaining the out-of-plane mirror symmetry and locking carrier spins out-of-plane. Secondly, fluorination suppresses the charge density wave in monolayer NbSe$_2$ and induces a van Hove singularity in the vicinity of the Fermi level, leading to a marked increase in the number of carriers and, consequently, strengthening the electron-phonon coupling (EPC). Lastly, the appearance of fluorine-related, low-frequency phonon modes further augments the EPC. Our findings suggest a promising avenue to elevate $T_c$ in two-dimensional Ising superconductors without compromising their Ising pairing.

preprint2024arXiv

Large deviation principle for a two-time-scale McKean-Vlasov model with jumps

This work focus on the large deviation principle for a two-time scale McKean-Vlasov system with jumps. Based on the variational framework of the McKean-Vlasov system with jumps, it is turned into weak convergence for the controlled system. Unlike general two-time scale system, the controlled McKean-Vlasov system is related to the law of the original system, which causes difficulties in qualitative analysis. In solving this problem, employing asymptotics of the original system and a Khasminskii-type averaging principle together is efficient. Finally, it is shown that the limit is related to the Dirac measure of the solution to the ordinary differential equation.

preprint2023arXiv

Direct detonation initiation in hydrogen/air mixture: effects of compositional gradient and hotspot condition

Two-dimensional simulations are conducted to investigate the direct initiation of cylindrical detonation in hydrogen/air mixtures with detailed chemistry. The effects of hotspot condition and mixture composition gradient on detonation initiation are studied. Different hotspot pressure and composition are first considered in the uniform mixture. It is found that detonation initiation fails for low hotspot pressures and supercritical regime dominates with high hotspot pressures. Detonation is directly initiated from the reactive hotspot, whilst it is ignited somewhere beyond the nonreactive hotspots. Two cell diverging patterns (i.e., abrupt and gradual) are identified and the detailed mechanisms are analyzed. Moreover, cell coalescence occurs if many irregular cells are generated initially, which promotes the local cell growing. We also consider nonuniform detonable mixtures. The results show that the initiated detonation experiences self-sustaining propagation, highly unstable propagation, and extinction in mixtures with a linearly decreasing equivalence ratio along the radial direction respectively, i.e., 1 to 0.9, 1 to 0.5 and 1 to 0. Moreover, the hydrodynamic structure analysis shows that, for the self-sustaining detonations, the hydrodynamic thickness increases at the overdriven stage, decreases as the cells are generated, and eventually become almost constant at the cell diverging stage, within which the sonic plane shows a sawtooth pattern. However, in the detonation extinction cases, the hydrodynamic thickness continuously increases, and no sawtooth sonic plane can be observed.

preprint2022arXiv

A Survey on Incomplete Multi-view Clustering

Conventional multi-view clustering seeks to partition data into respective groups based on the assumption that all views are fully observed. However, in practical applications, such as disease diagnosis, multimedia analysis, and recommendation system, it is common to observe that not all views of samples are available in many cases, which leads to the failure of the conventional multi-view clustering methods. Clustering on such incomplete multi-view data is referred to as incomplete multi-view clustering. In view of the promising application prospects, the research of incomplete multi-view clustering has noticeable advances in recent years. However, there is no survey to summarize the current progresses and point out the future research directions. To this end, we review the recent studies of incomplete multi-view clustering. Importantly, we provide some frameworks to unify the corresponding incomplete multi-view clustering methods, and make an in-depth comparative analysis for some representative methods from theoretical and experimental perspectives. Finally, some open problems in the incomplete multi-view clustering field are offered for researchers.

preprint2022arXiv

Collaborative Reflection-Augmented Autoencoder Network for Recommender Systems

As the deep learning techniques have expanded to real-world recommendation tasks, many deep neural network based Collaborative Filtering (CF) models have been developed to project user-item interactions into latent feature space, based on various neural architectures, such as multi-layer perceptron, auto-encoder and graph neural networks. However, the majority of existing collaborative filtering systems are not well designed to handle missing data. Particularly, in order to inject the negative signals in the training phase, these solutions largely rely on negative sampling from unobserved user-item interactions and simply treating them as negative instances, which brings the recommendation performance degradation. To address the issues, we develop a Collaborative Reflection-Augmented Autoencoder Network (CRANet), that is capable of exploring transferable knowledge from observed and unobserved user-item interactions. The network architecture of CRANet is formed of an integrative structure with a reflective receptor network and an information fusion autoencoder module, which endows our recommendation framework with the ability of encoding implicit user's pairwise preference on both interacted and non-interacted items. Additionally, a parametric regularization-based tied-weight scheme is designed to perform robust joint training of the two-stage CRANet model. We finally experimentally validate CRANet on four diverse benchmark datasets corresponding to two recommendation tasks, to show that debiasing the negative signals of user-item interactions improves the performance as compared to various state-of-the-art recommendation techniques. Our source code is available at https://github.com/akaxlh/CRANet.

preprint2022arXiv

Contrastive Meta Learning with Behavior Multiplicity for Recommendation

A well-informed recommendation framework could not only help users identify their interested items, but also benefit the revenue of various online platforms (e.g., e-commerce, social media). Traditional recommendation models usually assume that only a single type of interaction exists between user and item, and fail to model the multiplex user-item relationships from multi-typed user behavior data, such as page view, add-to-favourite and purchase. While some recent studies propose to capture the dependencies across different types of behaviors, two important challenges have been less explored: i) Dealing with the sparse supervision signal under target behaviors (e.g., purchase). ii) Capturing the personalized multi-behavior patterns with customized dependency modeling. To tackle the above challenges, we devise a new model CML, Contrastive Meta Learning (CML), to maintain dedicated cross-type behavior dependency for different users. In particular, we propose a multi-behavior contrastive learning framework to distill transferable knowledge across different types of behaviors via the constructed contrastive loss. In addition, to capture the diverse multi-behavior patterns, we design a contrastive meta network to encode the customized behavior heterogeneity for different users. Extensive experiments on three real-world datasets indicate that our method consistently outperforms various state-of-the-art recommendation methods. Our empirical studies further suggest that the contrastive meta learning paradigm offers great potential for capturing the behavior multiplicity in recommendation. We release our model implementation at: https://github.com/weiwei1206/CML.git.

preprint2022arXiv

Controllable chirality and band gap of quantum anomalous Hall insulators

Finding guiding principles to optimize properties of quantum anomalous Hall (QAH) insulators is of pivotal importance to fundamental science and applications. Here, we build a first-principles QAH material database of chirality and band gap, explore microscopic mechanisms determining the QAH material properties, and obtain a general physical picture that can comprehensively understand the QAH data. Our results reveal that the usually neglected Coulomb exchange is unexpectedly strong in a large class of QAH materials, which is the key to resolve experimental puzzles. Moreover, we identify simple indicators for property evaluation and suggest material design strategies to control QAH chirality and gap by tuning cooperative or competing contributions via magnetic co-doping, heterostructuring, spin-orbit proximity, etc. The work is valuable to future research of magnetic topological physics and materials.

preprint2022arXiv

CTL-MTNet: A Novel CapsNet and Transfer Learning-Based Mixed Task Net for the Single-Corpus and Cross-Corpus Speech Emotion Recognition

Speech Emotion Recognition (SER) has become a growing focus of research in human-computer interaction. An essential challenge in SER is to extract common attributes from different speakers or languages, especially when a specific source corpus has to be trained to recognize the unknown data coming from another speech corpus. To address this challenge, a Capsule Network (CapsNet) and Transfer Learning based Mixed Task Net (CTLMTNet) are proposed to deal with both the singlecorpus and cross-corpus SER tasks simultaneously in this paper. For the single-corpus task, the combination of Convolution-Pooling and Attention CapsNet module CPAC) is designed by embedding the self-attention mechanism to the CapsNet, guiding the module to focus on the important features that can be fed into different capsules. The extracted high-level features by CPAC provide sufficient discriminative ability. Furthermore, to handle the cross-corpus task, CTL-MTNet employs a Corpus Adaptation Adversarial Module (CAAM) by combining CPAC with Margin Disparity Discrepancy (MDD), which can learn the domain-invariant emotion representations through extracting the strong emotion commonness. Experiments including ablation studies and visualizations on both singleand cross-corpus tasks using four well-known SER datasets in different languages are conducted for performance evaluation and comparison. The results indicate that in both tasks the CTL-MTNet showed better performance in all cases compared to a number of state-of-the-art methods. The source code and the supplementary materials are available at: https://github.com/MLDMXM2017/CTLMTNet

preprint2022arXiv

Deep-Learning Density Functional Theory Hamiltonian for Efficient ab initio Electronic-Structure Calculation

The marriage of density functional theory (DFT) and deep learning methods has the potential to revolutionize modern computational materials science. Here we develop a deep neural network approach to represent DFT Hamiltonian (DeepH) of crystalline materials, aiming to bypass the computationally demanding self-consistent field iterations of DFT and substantially improve the efficiency of ab initio electronic-structure calculations. A general framework is proposed to deal with the large dimensionality and gauge (or rotation) covariance of DFT Hamiltonian matrix by virtue of locality and is realized by the message passing neural network for deep learning. High accuracy, high efficiency and good transferability of the DeepH method are generally demonstrated for various kinds of material systems and physical properties. The method provides a solution to the accuracy-efficiency dilemma of DFT and opens opportunities to explore large-scale material systems, as evidenced by a promising application to study twisted van der Waals materials.

preprint2022arXiv

Deniable Steganography

Steganography conceals the secret message into the cover media, generating a stego media which can be transmitted on public channels without drawing suspicion. As its countermeasure, steganalysis mainly aims to detect whether the secret message is hidden in a given media. Although the steganography techniques are improving constantly, the sophisticated steganalysis can always break a known steganographic method to some extent. With a stego media discovered, the adversary could find out the sender or receiver and coerce them to disclose the secret message, which we name as coercive attack in this paper. Inspired by the idea of deniable encryption, we build up the concepts of deniable steganography for the first time and discuss the feasible constructions for it. As an example, we propose a receiver-deniable steganographic scheme to deal with the receiver-side coercive attack using deep neural networks (DNN). Specifically, besides the real secret message, a piece of fake message is also embedded into the cover. On the receiver side, the real message can be extracted with an extraction module; while once the receiver has to surrender a piece of secret message under coercive attack, he can extract the fake message to deceive the adversary with another extraction module. Experiments demonstrate the scalability and sensitivity of the DNN-based receiver-deniable steganographic scheme.

preprint2022arXiv

Distributed Newton Optimization with Maximized Convergence Rate

The distributed optimization problem is set up in a collection of nodes interconnected via a communication network. The goal is to find the minimizer of a global objective function formed by the addition of partial functions locally known at each node. A number of methods are available for addressing this problem, having different advantages. The goal of this work is to achieve the maximum possible convergence rate. As the first step towards this end, we propose a new method which we show converges faster than other available options. As with most distributed optimization methods, convergence rate depends on a step size parameter. As the second step towards our goal we complement the proposed method with a fully distributed method for estimating the optimal step size that maximizes convergence speed. We provide theoretical guarantees for the convergence of the resulting method in a neighborhood of the solution. Also, for the case in which the global objective function has a single local minimum, we provide a different step size selection criterion together with theoretical guarantees for convergence. We present numerical experiments showing that, when using the same step size, our method converges significantly faster than its rivals. Experiments also show that the distributed step size estimation method achieves an asymptotic convergence rate very close to the theoretical maximum.

preprint2022arXiv

Effects of dilute coal char particle suspensions on propagating methane detonation wave

Methane/coal dust hybrid explosion is one of the common hazards in process and mining industries. In this study, methane detonation propagation in dilute coal char particle suspensions is studied based on Eulerian-Lagrangian method. The effects of char combustion on methane detonation dynamics are focused on. The results show that propagation of the methane detonation wave in coal particle suspensions are considerably affected by particle concentration and size. Detonation extinction occurs when the coal particle size is small and concentration is high. The averaged lead shock speed generally decreases with increased particle concentration and decreased particle size. Mean structure and interphase coupling of hybrid detonation are analysed, based on the gas and particle quantities. It is found that char combustion proceeds in the subsonic region behind the detonation wave and heat release is relatively distributed compared to that from gas phase reaction. The mass and energy transfer rates increase rapidly to the maximum near the reaction front in the induction zone. Moreover, for 1 μm particles, if the particle concentration is beyond a threshold value, detonation re-initiation occurs after it is quenched at the beginning of the coal dust suspensions. This is caused by hot spots from the shock focusing along the reaction front in a decoupled detonation and these shocks are generated from char combustion behind the lead shock.

preprint2022arXiv

Elementary excitations in a spin-orbit-coupled spin-1 Bose-Einstein condensate

While a spin-orbit-coupled spin-1 Bose-Einstein condensate has been experimentally observed, its elementary excitations remain unclear in the stripe phase. Here, we systematically study the elementary excitations in three distinct phases of a spin-orbit-coupled spin-1 Bose-Einstein condensate. We find that the excitation spectrum as well as the corresponding static response function and structure factor depend strongly on spin-orbit coupling parameters such as the quadratic Zeeman field and the Rabi frequency. In the stripe phase, besides two gapless Goldstone modes, we show the existence of roton excitations. Finally, we demonstrate that quantum phase transitions between these different phases including the zero-momentum, plane wave and stripe phases are characterized by the sound velocities and the quantum depletion.

preprint2022arXiv

Evolution of the electronic structure of ultrathin MnBi2Te4 Films

Ultrathin films of intrinsic magnetic topological insulator MnBi2Te4 exhibit fascinating quantum properties such as quantum anomalous Hall effect and axion insulator state. In this work, we systematically investigate the evolution of the electronic structure of MnBi2Te4 thin films. With increasing film thickness, the electronic structure changes from an insulator-type with a large energy gap to one with in-gap topological surface states, which is, however, still drastically different from the bulk material. By surface doping of alkali-metal atoms, a Rashba split band gradually emerges and hybridizes with topological surface states, which not only reconciles the puzzling difference between the electronic structures of the bulk and thin film MnBi2Te4 but also provides an interesting platform to establish Rashba ferromagnet that is attractive for (quantum) anomalous Hall effect. Our results provide important insights into the understanding and engineering of the intriguing quantum properties of MnBi2Te4 thin films.

preprint2022arXiv

Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution

Super-resolving the Magnetic Resonance (MR) image of a target contrast under the guidance of the corresponding auxiliary contrast, which provides additional anatomical information, is a new and effective solution for fast MR imaging. However, current multi-contrast super-resolution (SR) methods tend to concatenate different contrasts directly, ignoring their relationships in different clues, e.g., in the high-intensity and low-intensity regions. In this study, we propose a separable attention network (comprising high-intensity priority attention and low-intensity separation attention), named SANet. Our SANet could explore the areas of high-intensity and low-intensity regions in the "forward" and "reverse" directions with the help of the auxiliary contrast, while learning clearer anatomical structure and edge information for the SR of a target-contrast MR image. SANet provides three appealing benefits: (1) It is the first model to explore a separable attention mechanism that uses the auxiliary contrast to predict the high-intensity and low-intensity regions regions, diverting more attention to refining any uncertain details between these regions and correcting the fine areas in the reconstructed results. (2) A multi-stage integration module is proposed to learn the response of multi-contrast fusion at multiple stages, get the dependency between the fused representations, and boost their representation ability. (3) Extensive experiments with various state-of-the-art multi-contrast SR methods on fastMRI and clinical \textit{in vivo} datasets demonstrate the superiority of our model.

preprint2022arXiv

Extinction and re-initiation of methane detonation in dilute coal particle suspensions

In this study, methane detonation propagation in dilute coal particle suspensions is studied based on Eulerian-Lagrangian method. Two-dimensional configuration is considered, and a skeletal chemical mechanism (24 species and 104 reactions) is applied for methane combustion. The gas and particulate phase equations are solved using an OpenFOAM code for two-phase compressible reacting flow, RYrhoCentralFOAM. The effects of char combustion on methane detonation dynamics are investigated and devolatized coal particles are modelled. The results show that propagation of the methane detonation wave in coal particle suspensions are considerably affected by coal particle concentration and size. Detonation extinction occurs when the coal particle size is small and concentration is high. The averaged lead shock speed generally decreases with increased particle concentration and decreased particle size. Mean structure of methane and coal particle hybrid detonation is analysed, based on the gas and particle quantities. It is found that char combustion proceeds in the subsonic region behind the detonation wave and heat release is relatively distributed compared to that from gas phase reaction. Moreover, for 1 μm particle, if the particle concentration is beyond a threshold value, detonation re-initiation occurs after it is quenched at the beginning of the coal dust suspensions. This is caused by hot spots from the shock focusing along the reaction front in a decoupled detonation and these shocks are generated from char combustion behind the lead shock. A regime map of detonation propagation and extinction is predicted. It is found that the re-initiation location decreases with the particle concentration and approaches a constant value when the concentration exceeds 1000 g/m3. The results from this study are useful for prevention and suppression of methane/coal dust hybrid explosion.

preprint2022arXiv

Fine-Grained Object Classification via Self-Supervised Pose Alignment

Semantic patterns of fine-grained objects are determined by subtle appearance difference of local parts, which thus inspires a number of part-based methods. However, due to uncontrollable object poses in images, distinctive details carried by local regions can be spatially distributed or even self-occluded, leading to a large variation on object representation. For discounting pose variations, this paper proposes to learn a novel graph based object representation to reveal a global configuration of local parts for self-supervised pose alignment across classes, which is employed as an auxiliary feature regularization on a deep representation learning network.Moreover, a coarse-to-fine supervision together with the proposed pose-insensitive constraint on shallow-to-deep sub-networks encourages discriminative features in a curriculum learning manner. We evaluate our method on three popular fine-grained object classification benchmarks, consistently achieving the state-of-the-art performance. Source codes are available at https://github.com/yangxh11/P2P-Net.

preprint2022arXiv

Global-Supervised Contrastive Loss and View-Aware-Based Post-Processing for Vehicle Re-Identification

In this paper, we propose a Global-Supervised Contrastive loss and a view-aware-based post-processing (VABPP) method for the field of vehicle re-identification. The traditional supervised contrastive loss calculates the distances of features within the batch, so it has the local attribute. While the proposed Global-Supervised Contrastive loss has new properties and has good global attributes, the positive and negative features of each anchor in the training process come from the entire training set. The proposed VABPP method is the first time that the view-aware-based method is used as a post-processing method in the field of vehicle re-identification. The advantages of VABPP are that, first, it is only used during testing and does not affect the training process. Second, as a post-processing method, it can be easily integrated into other trained re-id models. We directly apply the view-pair distance scaling coefficient matrix calculated by the model trained in this paper to another trained re-id model, and the VABPP method greatly improves its performance, which verifies the feasibility of the VABPP method.

preprint2022arXiv

Hypergraph Contrastive Collaborative Filtering

Collaborative Filtering (CF) has emerged as fundamental paradigms for parameterizing users and items into latent representation space, with their correlative patterns from interaction data. Among various CF techniques, the development of GNN-based recommender systems, e.g., PinSage and LightGCN, has offered the state-of-the-art performance. However, two key challenges have not been well explored in existing solutions: i) The over-smoothing effect with deeper graph-based CF architecture, may cause the indistinguishable user representations and degradation of recommendation results. ii) The supervision signals (i.e., user-item interactions) are usually scarce and skewed distributed in reality, which limits the representation power of CF paradigms. To tackle these challenges, we propose a new self-supervised recommendation framework Hypergraph Contrastive Collaborative Filtering (HCCF) to jointly capture local and global collaborative relations with a hypergraph-enhanced cross-view contrastive learning architecture. In particular, the designed hypergraph structure learning enhances the discrimination ability of GNN-based CF paradigm, so as to comprehensively capture the complex high-order dependencies among users. Additionally, our HCCF model effectively integrates the hypergraph structure encoding with self-supervised learning to reinforce the representation quality of recommender systems, based on the hypergraph-enhanced self-discrimination. Extensive experiments on three benchmark datasets demonstrate the superiority of our model over various state-of-the-art recommendation methods, and the robustness against sparse user interaction data. Our model implementation codes are available at https://github.com/akaxlh/HCCF.

preprint2022arXiv

Ignition limit and shock-to-detonation transition mode of n-heptane/air mixture in high-speed wedge flows

In this work, oblique detonation of n-heptane/air mixture in high-speed wedge flows is simulated by solving the reactive Euler equations with a two-dimensional (2D) configuration. This is a first attempt to model complicated hydrocarbon fuel ODWs with a detailed chemistry (44 species and 112 reactions). Effects of freestream equivalence ratios and velocities are considered, and the abrupt and smooth transition from oblique shock to detonation are predicted. Ignition limit, ODW characteristics, and predictability of the transition mode are discussed. Firstly, homogeneous constant-volume ignition calculations are performed for both fuel-lean and stoichiometric mixtures. The results show that the ignition delay generally increases with the wedge angle. However, a negative wedge angle dependence is observed, due to the negative temperature coefficient effects. The wedge angle range for successful ignition of n-heptane/air mixtures decreases when the wedge length is reduced. From 2D simulations of stationary ODWs, the initiation length generally decreases with the freestream equivalence ratio, but the transition length exhibits weakly non-monotonic dependence. Smooth ODW typically occurs for lean conditions (equivalence ratio < 0.4). The interactions between shock / compression waves and chemical reaction inside the induction zone are also studied with the chemical explosive mode analysis. Moreover, the predictability of the shock-to-detonation transition mode is explored through quantifying the relation between ignition delay and chemical excitation time. It is demonstrated that the ignition delay (excitation time) increases (decreases) with the freestream equivalence ratio for the three studied oncoming flow velocities. Smaller excitation time corresponds to stronger pressure waves from the ignition location behind OSW.

preprint2022arXiv

Interactions between a propagating detonation wave and circular water cloud in hydrogen/air mixture

Interactions between a propagating hydrogen/air detonation wave and circular water cloud are studied. Eulerian Lagrangian method involving two-way gas-droplet coupling is applied. Different droplet (diameter, concentration) and cloud (diameter) properties are considered. Results show that droplet size, concentration and cloud radius have significant effects on peak pressure trajectory of the detonation wave. Three propagation modes are identified: perturbed propagation, leeward re-detonation, and detonation extinction. Leeward re-detonation is analyzed from unsteady evolutions of gas and liquid droplet quantities. The detonation is re-initiated by a local hot spot from shock focusing of upper and lower diffracted detonations. Disintegration of water droplets proceeds when the detonation wave crosses the cloud. In addition, detonation extinction is featured by quickly fading peak pressure trajectories when the detonation wave passes the larger cloud, and no local autoignition occurs in the shock focusing area. Evolutions of thermochemical structures from the shocked area in an extinction process are also studied. The transfer rates of mass, energy and momentum of detonation success and failure are analyzed. Moreover, parametric studies demonstrate that the critical cloud size to quench a detonation decreases when the droplet concentration is increased. However, when the droplet concentration is beyond 0.84 kg/m3, the critical cloud size is negligibly influenced due to small droplets. Two-phase fluid interfacial instability is observed, and the mechanism of cloud evolution is studied with the distributions of droplet, vorticity, density / pressure gradient magnitudes, and gas velocity.

preprint2022arXiv

Interactions between a propagating detonation wave and water spray cloud in hydrogen/air mixture

Inhibition of hydrogen explosion is crucial to realize its wide applications and fine water spray is an ideal mitigant due to numerous advantages. In this work, interactions between a propagating hydrogen/air detonation wave and circular water cloud are numerically studied. Eulerian-Lagrangian method involving two-way gas-droplets coupling is applied, with a two-dimensional configuration. Different droplet (diameter, concentration) and cloud (diameter) properties are considered. Our results show that droplet size, concentration and cloud radius have significant effects on peak pressure trajectory of the detonation wave. After interacting with cloud, the detonation wave exhibits three propagation modes, including perturbed propagation, leeward re-detonation, and detonation extinction. Leeward re-detonation is analyzed from unsteady evolutions of gas and liquid droplet quantities. The refracted detonation wave inside the cloud is decoupled and propagates more slowly than the one outside the cloud. The detonation re-initiation is from a local hot spot, caused by shock focusing from upper and lower diffracted detonations. Disintegration of water droplets proceeds when the detonation wave crosses the cloud and multiphase interfacial instability is observed due to the difference in effective density of the two fluids. Furthermore, detonation extinction is observed when we consider various water cloud size. It is featured by quickly fading peak pressure trajectories when the detonation passes the cloud, and no local autoignition occurs in the shock focusing area. Evolutions of thermochemical structures from the shocked area in an extinction process are also studied. Moreover, parametric studies considering various droplet concentrations and cloud radii are performed.

preprint2022arXiv

Joint Neural AEC and Beamforming with Double-Talk Detection

Acoustic echo cancellation (AEC) in full-duplex communication systems eliminates acoustic feedback. However, nonlinear distortions induced by audio devices, background noise, reverberation, and double-talk reduce the efficiency of conventional AEC systems. Several hybrid AEC models were proposed to address this, which use deep learning models to suppress residual echo from standard adaptive filtering. This paper proposes deep learning-based joint AEC and beamforming model (JAECBF) building on our previous self-attentive recurrent neural network (RNN) beamformer. The proposed network consists of two modules: (i) multi-channel neural-AEC, and (ii) joint AEC-RNN beamformer with a double-talk detection (DTD) that computes time-frequency (T-F) beamforming weights. We train the proposed model in an end-to-end approach to eliminate background noise and echoes from far-end audio devices, which include nonlinear distortions. From experimental evaluations, we find the proposed network outperforms other multi-channel AEC and denoising systems in terms of speech recognition rate and overall speech quality.

preprint2022arXiv

Molecular conformer search with low-energy latent space

Identifying low-energy conformers with quantum mechanical accuracy for molecules with many degrees of freedom is challenging. In this work, we use the molecular dihedral angles as features and explore the possibility of performing molecular conformer search in a latent space with a generative model named variational auto-encoder (VAE). We bias the VAE towards low-energy molecular configurations to generate more informative data. In this way, we can effectively build a reliable energy model for the low-energy potential energy surface. After the energy model has been built, we extract local-minimum conformations and refine them with structure optimization. We have tested and benchmarked our low-energy latent-space (LOLS) structure search method on organic molecules with $5-9$ searching dimensions. Our results agree with previous studies.

preprint2022arXiv

Multi-Behavior Enhanced Recommendation with Cross-Interaction Collaborative Relation Modeling

Many previous studies aim to augment collaborative filtering with deep neural network techniques, so as to achieve better recommendation performance. However, most existing deep learning-based recommender systems are designed for modeling singular type of user-item interaction behavior, which can hardly distill the heterogeneous relations between user and item. In practical recommendation scenarios, there exist multityped user behaviors, such as browse and purchase. Due to the overlook of user&#39;s multi-behavioral patterns over different items, existing recommendation methods are insufficient to capture heterogeneous collaborative signals from user multi-behavior data. Inspired by the strength of graph neural networks for structured data modeling, this work proposes a Graph Neural Multi-Behavior Enhanced Recommendation (GNMR) framework which explicitly models the dependencies between different types of user-item interactions under a graph-based message passing architecture. GNMR devises a relation aggregation network to model interaction heterogeneity, and recursively performs embedding propagation between neighboring nodes over the user-item interaction graph. Experiments on real-world recommendation datasets show that our GNMR consistently outperforms state-of-the-art methods. The source code is available at https://github.com/akaxlh/GNMR.

preprint2022arXiv

Multi-Behavior Sequential Recommendation with Temporal Graph Transformer

Modeling time-evolving preferences of users with their sequential item interactions, has attracted increasing attention in many online applications. Hence, sequential recommender systems have been developed to learn the dynamic user interests from the historical interactions for suggesting items. However, the interaction pattern encoding functions in most existing sequential recommender systems have focused on single type of user-item interactions. In many real-life online platforms, user-item interactive behaviors are often multi-typed (e.g., click, add-to-favorite, purchase) with complex cross-type behavior inter-dependencies. Learning from informative representations of users and items based on their multi-typed interaction data, is of great importance to accurately characterize the time-evolving user preference. In this work, we tackle the dynamic user-item relation learning with the awareness of multi-behavior interactive patterns. Towards this end, we propose a new Temporal Graph Transformer (TGT) recommendation framework to jointly capture dynamic short-term and long-range user-item interactive patterns, by exploring the evolving correlations across different types of behaviors. The new TGT method endows the sequential recommendation architecture to distill dedicated knowledge for type-specific behavior relational context and the implicit behavior dependencies. Experiments on the real-world datasets indicate that our method TGT consistently outperforms various state-of-the-art recommendation methods. Our model implementation codes are available at https://github.com/akaxlh/TGT.

preprint2022arXiv

Multi-Modal Transformer for Accelerated MR Imaging

Accelerated multi-modal magnetic resonance (MR) imaging is a new and effective solution for fast MR imaging, providing superior performance in restoring the target modality from its undersampled counterpart with guidance from an auxiliary modality. However, existing works simply combine the auxiliary modality as prior information, lacking in-depth investigations on the potential mechanisms for fusing different modalities. Further, they usually rely on the convolutional neural networks (CNNs), which is limited by the intrinsic locality in capturing the long-distance dependency. To this end, we propose a multi-modal transformer (MTrans), which is capable of transferring multi-scale features from the target modality to the auxiliary modality, for accelerated MR imaging. To capture deep multi-modal information, our MTrans utilizes an improved multi-head attention mechanism, named cross attention module, which absorbs features from the auxiliary modality that contribute to the target modality. Our framework provides three appealing benefits: (i) Our MTrans use an improved transformers for multi-modal MR imaging, affording more global information compared with existing CNN-based methods. (ii) A new cross attention module is proposed to exploit the useful information in each modality at different scales. The small patch in the target modality aims to keep more fine details, the large patch in the auxiliary modality aims to obtain high-level context features from the larger region and supplement the target modality effectively. (iii) We evaluate MTrans with various accelerated multi-modal MR imaging tasks, e.g., MR image reconstruction and super-resolution, where MTrans outperforms state-of-the-art methods on fastMRI and real-world clinical datasets.

preprint2022arXiv

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back. In this paper, we present an all-deep-learning framework that implicitly estimates the second order statistics of echo/noise and target speech, and jointly solves echo and noise suppression through an attention based recurrent neural network. The proposed model outperforms the state-of-the-art joint echo cancellation and speech enhancement method F-T-LSTM in terms of objective speech quality metrics, speech recognition accuracy and model complexity. We show that this model can work with speaker embedding for better target speech enhancement and furthermore develop a branch for automatic gain control (AGC) task to form an all-in-one front-end speech enhancement system.

preprint2022arXiv

Non-Hermitian Absorption Spectroscopy

While non-Hermitian Hamiltonians have been experimentally realized in cold atom systems, it remains an outstanding open question of how to experimentally measure their complex energy spectra in momentum space for a realistic system with boundaries. The existence of non-Hermitian skin effects may make the question even more difficult to address given the fact that energy spectra for a system with open boundaries are dramatically different from those in momentum space; the fact may even lead to the notion that momentum-space band structures are not experimentally accessible for a system with open boundaries. Here, we generalize the widely used radio-frequency spectroscopy to measure both real and imaginary parts of complex energy spectra of a non-Hermitian quantum system for either bosonic or fermionic atoms. By weakly coupling the energy levels of a non-Hermitian system to auxiliary energy levels, we theoretically derive a formula showing that the decay of atoms on the auxiliary energy levels reflects the real and imaginary parts of energy spectra in momentum space. We further prove that measurement outcomes are independent of boundary conditions in the thermodynamic limit, providing strong evidence that the energy spectrum in momentum space is experimentally measurable. We finally apply our non-Hermitian absorption spectroscopy protocol to the Hatano-Nelson model and non-Hermitian Weyl semimetals to demonstrate its feasibility.

preprint2022arXiv

On the evolutions of induction zone structure in wedge-stabilized oblique detonation with water mist flows

Two-dimensional wedge-stabilized oblique detonations in stoichiometric and fuel-lean H2/O2/Ar mixtures with water mists are studied with Eulerian-Lagrangian method. The effects of water droplet mass flow rate on flow and chemical structures in the induction zone, as well as physical / chemical roles of water vapor, are investigated. The results show that the oblique detonation wave (ODW) can stand in a range of water mass flow rates for both stoichiometric and fuel-lean mixtures. With increased droplet mass flow rate, the deflagration front in the induction zone is distorted and becomes zigzagged, but the transition mode from oblique shock wave (OSW) to ODW does not change. Moreover, the initiation and transition locations monotonically increase, and the OSW and ODW angles decrease, due to droplet evaporation and water vapor dilution in the induction region. For fuel-lean mixtures, the sensitivity of characteristic locations to the droplet loading variations is mild, which signifies better intrinsic stability and resilience to the oncoming water droplets. The chemical explosiveness of the gaseous mixture between the lead shock and reaction front is studied with the chemical explosive method analysis. The smooth transition is caused by the highly enhanced reactivity of the gas immediately behind the curved shock, intensified by the compression waves. Nonetheless, the abrupt transition results from the intersection between the beforehand generated detonation wave in the induction zone and OSW. Besides, the degree to which the gas chemical reactivity in the induction zone for fuel-lean mixtures is reduced by evaporating droplets is generally lower than that for stoichiometric gas. Also, physical and chemical effects of water vapor from liquid droplets result in significant differences in ODW initiation and morphology.

preprint2022arXiv

Spatial-Temporal Hypergraph Self-Supervised Learning for Crime Prediction

Crime has become a major concern in many cities, which calls for the rising demand for timely predicting citywide crime occurrence. Accurate crime prediction results are vital for the beforehand decision-making of government to alleviate the increasing concern about the public safety. While many efforts have been devoted to proposing various spatial-temporal forecasting techniques to explore dependence across locations and time periods, most of them follow a supervised learning manner, which limits their spatial-temporal representation ability on sparse crime data. Inspired by the recent success in self-supervised learning, this work proposes a Spatial-Temporal Hypergraph Self-Supervised Learning framework (ST-HSL) to tackle the label scarcity issue in crime prediction. Specifically, we propose the cross-region hypergraph structure learning to encode region-wise crime dependency under the entire urban space. Furthermore, we design the dual-stage self-supervised learning paradigm, to not only jointly capture local- and global-level spatial-temporal crime patterns, but also supplement the sparse crime representation by augmenting region self-discrimination. We perform extensive experiments on two real-life crime datasets. Evaluation results show that our ST-HSL significantly outperforms state-of-the-art baselines. Further analysis provides insights into the superiority of our ST-HSL method in the representation of spatial-temporal crime patterns. The implementation code is available at https://github.com/LZH-YS1998/STHSL.

preprint2022arXiv

Spatial-Temporal Sequential Hypergraph Network for Crime Prediction with Dynamic Multiplex Relation Learning

Crime prediction is crucial for public safety and resource optimization, yet is very challenging due to two aspects: i) the dynamics of criminal patterns across time and space, crime events are distributed unevenly on both spatial and temporal domains; ii) time-evolving dependencies between different types of crimes (e.g., Theft, Robbery, Assault, Damage) which reveal fine-grained semantics of crimes. To tackle these challenges, we propose Spatial-Temporal Sequential Hypergraph Network (ST-SHN) to collectively encode complex crime spatial-temporal patterns as well as the underlying category-wise crime semantic relationships. In specific, to handle spatial-temporal dynamics under the long-range and global context, we design a graph-structured message passing architecture with the integration of the hypergraph learning paradigm. To capture category-wise crime heterogeneous relations in a dynamic environment, we introduce a multi-channel routing mechanism to learn the time-evolving structural dependency across crime types. We conduct extensive experiments on two real-world datasets, showing that our proposed ST-SHN framework can significantly improve the prediction performance as compared to various state-of-the-art baselines. The source code is available at: https://github.com/akaxlh/ST-SHN.

preprint2022arXiv

Specificity-Preserving Federated Learning for MR Image Reconstruction

Federated learning (FL) can be used to improve data privacy and efficiency in magnetic resonance (MR) image reconstruction by enabling multiple institutions to collaborate without needing to aggregate local data. However, the domain shift caused by different MR imaging protocols can substantially degrade the performance of FL models. Recent FL techniques tend to solve this by enhancing the generalization of the global model, but they ignore the domain-specific features, which may contain important information about the device properties and be useful for local reconstruction. In this paper, we propose a specificity-preserving FL algorithm for MR image reconstruction (FedMRI). The core idea is to divide the MR reconstruction model into two parts: a globally shared encoder to obtain a generalized representation at the global level, and a client-specific decoder to preserve the domain-specific properties of each client, which is important for collaborative reconstruction when the clients have unique distribution. Such scheme is then executed in the frequency space and the image space respectively, allowing exploration of generalized representation and client-specific properties simultaneously in different spaces. Moreover, to further boost the convergence of the globally shared encoder when a domain shift is present, a weighted contrastive regularization is introduced to directly correct any deviation between the client and server during optimization. Extensive experiments demonstrate that our FedMRI&#39;s reconstructed results are the closest to the ground-truth for multi-institutional data, and that it outperforms state-of-the-art FL methods.

preprint2022arXiv

Superradiant Production of Heavy Dark Matter from Primordial Black Holes

Rotating black holes (BHs) can efficiently transfer energy to the surrounding environment via superradiance. In particular, when the Compton length of a particle is comparable to the gravitational radius of a BH, the particle&#39;s occupation number can be exponentially amplified. In this work, we investigate the effect of the primordial-black-hole (PBH) superradiant instabilities on the generation of heavy bosonic dark matter (DM) with mass above $\sim$ 1 TeV. Additionally, we analyze its interplay with other purely gravitational and therefore unavoidable DM production mechanisms such as Hawking emission and the ultraviolet freeze-in. We find that superradiance can significantly increase the DM density produced by PBHs with respect to the case that only considers Hawking emission, and hence lower initial PBH densities are required.

preprint2022arXiv

Topological Quantum Phase Transitions in Metallic Shiba Lattices

Shiba bands formed by overlapping Yu-Shiba-Rusinov subgap states in magnetic impurities on a superconductor play an important role in topological superconductors. Here, we theoretically demonstrate the existence of a new type of Shiba bands (dubbed topological Shiba metal) on a magnetically doped $s$-wave superconducting surface with Rashba spin-orbit coupling in the presence of a weak in-plane magnetic field. Such topological gapless Shiba bands develop from gapped Shiba bands through Lifshitz phase transitions accompanied by second-order quantum phase transitions for the intrinsic thermal Hall conductance. We also find a mechanism in Shiba lattices that protects the first-order quantum phase transitions for the intrinsic thermal Hall conductance. Due to the long-range hopping in Shiba lattices, the topological Shiba metal exhibits intrinsic thermal Hall conductance with large nonquantized values. As a consequence, there emerge a large number of second-order quantum phase transitions.

preprint2022arXiv

Ultraviolet Freeze-in with a Time-dependent Inflaton Decay

It is typically assumed that during reheating the inflaton decays with a constant decay width. However, this is not guaranteed and can have a strong impact on the dark matter (DM) genesis. In the context of the ultraviolet (UV) freeze-in mechanism, if the operators connecting the dark and visible sectors are of sufficiently high mass dimension, the bulk of the DM abundance is produced during and not after reheating. We study here the impact of a time-dependent decay width of the inflaton on the DM abundance, emphasizing the differences with respect to the cases where the decay is either instantaneous or constant. We also provide concrete examples for DM production via UV freeze-in, e.g., from 2-to-2 scatterings of standard model particles, or from inflaton scatterings or decays, elucidating how the time-dependence influences the DM yield.

preprint2022arXiv

UniParser: A Unified Log Parser for Heterogeneous Log Data

Logs provide first-hand information for engineers to diagnose failures in large-scale online service systems. Log parsing, which transforms semi-structured raw log messages into structured data, is a prerequisite of automated log analysis such as log-based anomaly detection and diagnosis. Almost all existing log parsers follow the general idea of extracting the common part as templates and the dynamic part as parameters. However, these log parsing methods, often neglect the semantic meaning of log messages. Furthermore, high diversity among various log sources also poses an obstacle in the generalization of log parsing across different systems. In this paper, we propose UniParser to capture the common logging behaviours from heterogeneous log data. UniParser utilizes a Token Encoder module and a Context Encoder module to learn the patterns from the log token and its neighbouring context. A Context Similarity module is specially designed to model the commonalities of learned patterns. We have performed extensive experiments on 16 public log datasets and our results show that UniParser outperperforms state-of-the-art log parsers by a large margin.

preprint2021arXiv

ADL-MVDR: All deep learning MVDR beamformer for target speech separation

Speech separation algorithms are often used to separate the target speech from other interfering sources. However, purely neural network based speech separation systems often cause nonlinear distortion that is harmful for automatic speech recognition (ASR) systems. The conventional mask-based minimum variance distortionless response (MVDR) beamformer can be used to minimize the distortion, but comes with high level of residual noise. Furthermore, the matrix operations (e.g., matrix inversion) involved in the conventional MVDR solution are sometimes numerically unstable when jointly trained with neural networks. In this paper, we propose a novel all deep learning MVDR framework, where the matrix inversion and eigenvalue decomposition are replaced by two recurrent neural networks (RNNs), to resolve both issues at the same time. The proposed method can greatly reduce the residual noise while keeping the target speech undistorted by leveraging on the RNN-predicted frame-wise beamforming weights. The system is evaluated on a Mandarin audio-visual corpus and compared against several state-of-the-art (SOTA) speech separation systems. Experimental results demonstrate the superiority of the proposed method across several objective metrics and ASR accuracy.

preprint2021arXiv

ALP Dark Matter in a Primordial Black Hole Dominated Universe

We investigate the phenomenological consequences of axion-like particle (ALP) dark matter with an early matter domination triggered by primordial black holes (PBHs). We focus on light BHs with masses smaller than $\sim 10^9~$g which fully evaporate before Big Bang nucleosynthesis. We numerically solve the coupled Boltzmann equations, carefully taking the greybody factors and BH angular momentum into account. We find that the entropy injection from PBH evaporation dilutes the ALP relic abundance originally produced via the vacuum misalignment mechanism, opening the parameter space with larger scales $f_a$ or, equivalently, smaller ALP-photon couplings $g_{aγ}$, within the reach of future detectors as ABRACADABRA, KLASH, ADMX, and DM-Radio. Moreover, the ALP minicluster masses can be several orders of magnitude larger if the early Universe features an PBH dominated epoch. For the relativistic ALPs produced directly from Hawking radiation, we find that their contribution to the dark radiation is within the sensitivity of next generation CMB experiments. For the sake of completeness, we also revisit the particular case of the QCD axion.

preprint2021arXiv

Basic formulation and first-principles implementation of nonlinear magneto-optical effects

First-principles calculation of nonlinear magneto-optical effects has become an indispensable tool to reveal the geometric and topological nature of electronic states and to understand light-matter interactions. While intriguingly rich physics could emerge in magnetic materials, further methodological developments are required to deal with time-reversal symmetry breaking, due to the degeneracy and gauge problems caused by symmetry and the low-frequency divergence problem in the existing calculation formalism. Here we present a gauge-covariant and low-frequency convergent formalism for the first-principles computation. Remarkably, this formalism generally works for both non-magnetic and magnetic materials with or without band degeneracy. Reliability and capability of our method are demonstrated by studying example materials (i.e., bilayers of MnBi$_2$Te$_4$ and CrI$_3$) and comparing with published results. Moreover, an importance correction term that ensures gauge covariance of degenerate states is derived, whose influence on physical responses is systematically checked. Our method enables computation of nonlinear magneto-optical effects in magnetic materials and paves the way for exploring rich physics created by the interplay of light and magnetism.

preprint2021arXiv

FWB-Net:Front White Balance Network for Color Shift Correction in Single Image Dehazing via Atmospheric Light Estimation

In recent years, single image dehazing deep models based on Atmospheric Scattering Model (ASM) have achieved remarkable results. But the dehazing outputs of those models suffer from color shift. Analyzing the ASM model shows that the atmospheric light factor (ALF) is set as a scalar which indicates ALF is constant for whole image. However, for images taken in real-world, the illumination is not uniformly distributed over whole image which brings model mismatch and possibly results in color shift of the deep models using ASM. Bearing this in mind, in this study, first, a new non-homogeneous atmospheric scattering model (NH-ASM) is proposed for improving image modeling of hazy images taken under complex illumination conditions. Second, a new U-Net based front white balance module (FWB-Module) is dedicatedly designed to correct color shift before generating dehazing result via atmospheric light estimation. Third, a new FWB loss is innovatively developed for training FWB-Module, which imposes penalty on color shift. In the end, based on NH-ASM and front white balance technology, an end-to-end CNN-based color-shift-restraining dehazing network is developed, termed as FWB-Net. Experimental results demonstrate the effectiveness and superiority of our proposed FWB-Net for dehazing on both synthetic and real-world images.

preprint2021arXiv

Graph Meta Network for Multi-Behavior Recommendation

Modern recommender systems often embed users and items into low-dimensional latent representations, based on their observed interactions. In practical recommendation scenarios, users often exhibit various intents which drive them to interact with items with multiple behavior types (e.g., click, tag-as-favorite, purchase). However, the diversity of user behaviors is ignored in most of the existing approaches, which makes them difficult to capture heterogeneous relational structures across different types of interactive behaviors. Exploring multi-typed behavior patterns is of great importance to recommendation systems, yet is very challenging because of two aspects: i) The complex dependencies across different types of user-item interactions; ii) Diversity of such multi-behavior patterns may vary by users due to their personalized preference. To tackle the above challenges, we propose a Multi-Behavior recommendation framework with Graph Meta Network to incorporate the multi-behavior pattern modeling into a meta-learning paradigm. Our developed MB-GMN empowers the user-item interaction learning with the capability of uncovering type-dependent behavior representations, which automatically distills the behavior heterogeneity and interaction diversity for recommendations. Extensive experiments on three real-world datasets show the effectiveness of MB-GMN by significantly boosting the recommendation performance as compared to various state-of-the-art baselines. The source code is available athttps://github.com/akaxlh/MB-GMN.

preprint2021arXiv

Higher-order Topological Anderson Insulators

We study disorder effects in a two-dimensional system with chiral symmetry and find that disorder can induce a quadrupole topological insulating phase (a higher-order topological phase with quadrupole moments) from a topologically trivial phase. Their topological properties manifest in a topological invariant defined based on effective boundary Hamiltonians, the quadrupole moment, and zero-energy corner modes. We find gapped and gapless topological phases and a Griffiths regime. In the gapless topological phase, all the states are localized, while in the Griffiths regime, the states at zero energy become multifractal. We further apply the self-consistent Born approximation to show that the induced topological phase arises from disorder renormalized masses. We finally introduce a practical experimental scheme with topoelectrical circuits where the predicted topological phenomena can be observed by impedance measurements. Our work opens the door to studying higher-order topological Anderson insulators and their localization properties.

preprint2021arXiv

Magnetization-tuned topological quantum phase transition in MnBi2Te4 devices

Recently, the intrinsic magnetic topological insulator MnBi2Te4 has attracted enormous research interest due to the great success in realizing exotic topological quantum states, such as the quantum anomalous Hall effect (QAHE), axion insulator state, high-Chern-number and high-temperature Chern insulator states. One key issue in this field is to effectively manipulate these states and control topological phase transitions. Here, by systematic angle-dependent transport measurements, we reveal a magnetization-tuned topological quantum phase transition from Chern insulator to magnetic insulator with gapped Dirac surface states in MnBi2Te4 devices. Specifically, as the magnetic field is tilted away from the out-of-plane direction by around 40-60 degrees, the Hall resistance deviates from the quantization value and a colossal, anisotropic magnetoresistance is detected. The theoretical analyses based on modified Landauer-Buttiker formalism show that the field-tilt-driven switching from ferromagnetic state to canted antiferromagnetic state induces a topological quantum phase transition from Chern insulator to magnetic insulator with gapped Dirac surface states in MnBi2Te4 devices. Our work provides an efficient means for modulating topological quantum states and topological quantum phase transitions.

preprint2021arXiv

Overshooting, Critical Higgs Inflation and Second Order Gravitational Wave Signatures

The self coupling $λ$ of the Higgs boson in the Standard Model may show critical behavior, i.e. the Higgs potential may have a point at an energy scale $\sim 10^{17-18}$ GeV where both the first and second derivatives (almost) vanish. In this case the Higgs boson can serve as inflaton even if its nonminimal coupling to the curvature scalar is only ${\cal O}(10)$, thereby alleviating concerns about the perturbative unitarity of the theory. We find that just before the Higgs as inflaton enters the flat region of the potential the usual slow--roll conditions are violated. This leads to &#34;overshooting&#34; behavior, which in turn strongly enhances scalar curvature perturbations because of the excitation of entropic (non--adiabatic) perturbations. For appropriate choice of the free parameters these large perturbations occur at length scales relevant for the formation of primordial black holes. Even if these perturbations are not quite large enough to trigger copious black hole formation, they source second order tensor perturbations, i.e. primordial gravitational waves; the corresponding energy density can be detected by the proposed space-based gravitational wave detectors DECIGO and BBO.

preprint2021arXiv

Polynomial Inflation and Dark Matter

We present a minimal UV complete framework to embed inflation and dark matter by extending the standard model with a singlet real scalar field (the inflaton) and a singlet fermonic field acting as dark matter. The inflaton features the most general renormalizable polynomial up to quartic order, which is flat due to the existence of a perturbed inflection-point, comfortably fitting CMB measurements. We also analyze (p)reheating by considering the Higgs production via inflaton decay. In the early universe, dark matter can be generated by the mediation of gravitons or inflatons. However, the production via the direct decay of the inflatons dominates, making viable a large range of dark matter masses, from $\mathcal{O}(10^{-5})$ GeV to $\mathcal{O}(10^{11})$ GeV.

preprint2021arXiv

Symmetry-adapted graph neural networks for constructing molecular dynamics force fields

Molecular dynamics is a powerful simulation tool to explore material properties. Most of the realistic material systems are too large to be simulated with first-principles molecular dynamics. Classical molecular dynamics has lower computational cost but requires accurate force fields to achieve chemical accuracy. In this work, we develop a symmetry-adapted graph neural networks framework, named molecular dynamics graph neural networks (MDGNN), to construct force fields automatically for molecular dynamics simulations for both molecules and crystals. This architecture consistently preserves the translation, rotation and permutation invariance in the simulations. We propose a new feature engineering method including higher order contributions and show that MDGNN accurately reproduces the results of both classical and first-principles molecular dynamics. We also demonstrate that force fields constructed by the model has good transferability. Therefore, MDGNN provides an efficient and promising option for molecular dynamics simulations of large scale systems with high accuracy.

preprint2021arXiv

Symmetry-Protected Topological Phases in a Rydberg Glass

Recent theoretical studies predict that structural disorder, serving as a bridge connecting a crystalline material to an amorphous material, can induce a topological insulator from a trivial phase. However, to experimentally observe such a topological phase transition is very challenging due to the difficulty in controlling structural disorder in a quantum material. Given experimental realization of randomly positioned Rydberg atoms, such a system is naturally suited to studying structural disorder induced topological phase transitions and topological amorphous phases. Motivated by the development, we study topological phases in an experimentally accessible one-dimensional amorphous Rydberg atom chain with random atom configurations. In the single-particle level, we find symmetry-protected topological amorphous insulators and a structural disorder induced topological phase transition, indicating that Rydberg atoms provide an ideal platform to experimentally observe the phenomenon using state-of-the-art technologies. Furthermore, we predict the existence of a gapless symmetry-protected topological phase of interacting bosons in the experimentally accessible system. The resultant many-body topological amorphous phase is characterized by a $\mathbb{Z}_2$ invariant.

preprint2020arXiv

Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network

Background noise, interfering speech and room reverberation frequently distort target speech in real listening environments. In this study, we address joint speech separation and dereverberation, which aims to separate target speech from background noise, interfering speech and room reverberation. In order to tackle this fundamentally difficult problem, we propose a novel multimodal network that exploits both audio and visual signals. The proposed network architecture adopts a two-stage strategy, where a separation module is employed to attenuate background noise and interfering speech in the first stage and a dereverberation module to suppress room reverberation in the second stage. The two modules are first trained separately, and then integrated for joint training, which is based on a new multi-objective loss function. Our experimental results show that the proposed multimodal network yields consistently better objective intelligibility and perceptual quality than several one-stage and two-stage baselines. We find that our network achieves a 21.10% improvement in ESTOI and a 0.79 improvement in PESQ over the unprocessed mixtures. Moreover, our network architecture does not require the knowledge of the number of speakers.

preprint2020arXiv

Averaging principles for non-autonomous two-time-scale stochastic reaction-diffusion equations with jump

In this paper, we aim to develop the averaging principle for a slow-fast system of stochastic reaction-diffusion equations driven by Poisson random measures. The coefficients of the equation are assumed to be functions of time, and some of them are periodic or almost periodic. Therefore, the Poisson term needs to be processed, and a new averaged equation needs to be given. For this reason, the existence of time-dependent evolution family of measures associated with the fast equation is studied, and proved that it is almost periodic. Next, according to the characteristics of almost periodic functions, the averaged coefficient is defined by the evolution family of measures, and the averaged equation is given. Finally, the validity of the averaging principle is verified by using the Khasminskii method.

preprint2020arXiv

Dedge-AGMNet:an effective stereo matching network optimized by depth edge auxiliary task

To improve the performance in ill-posed regions, this paper proposes an atrous granular multi-scale network based on depth edge subnetwork(Dedge-AGMNet). According to a general fact, the depth edge is the binary semantic edge of instance-sensitive. This paper innovatively generates the depth edge ground-truth by mining the semantic and instance dataset simultaneously. To incorporate the depth edge cues efficiently, our network employs the hard parameter sharing mechanism for the stereo matching branch and depth edge branch. The network modifies SPP to Dedge-SPP, which fuses the depth edge features to the disparity estimation network. The granular convolution is extracted and extends to 3D architecture. Then we design the AGM module to build a more suitable structure. This module could capture the multi-scale receptive field with fewer parameters. Integrating the ranks of different stereo datasets, our network outperforms other stereo matching networks and advances state-of-the-art performances on the Sceneflow, KITTI 2012 and KITTI 2015 benchmark datasets.

preprint2020arXiv

Deep Bilateral Retinex for Low-Light Image Enhancement

Low-light images, i.e. the images captured in low-light conditions, suffer from very poor visibility caused by low contrast, color distortion and significant measurement noise. Low-light image enhancement is about improving the visibility of low-light images. As the measurement noise in low-light images is usually significant yet complex with spatially-varying characteristic, how to handle the noise effectively is an important yet challenging problem in low-light image enhancement. Based on the Retinex decomposition of natural images, this paper proposes a deep learning method for low-light image enhancement with a particular focus on handling the measurement noise. The basic idea is to train a neural network to generate a set of pixel-wise operators for simultaneously predicting the noise and the illumination layer, where the operators are defined in the bilateral space. Such an integrated approach allows us to have an accurate prediction of the reflectance layer in the presence of significant spatially-varying measurement noise. Extensive experiments on several benchmark datasets have shown that the proposed method is very competitive to the state-of-the-art methods, and has significant advantage over others when processing images captured in extremely low lighting conditions.

preprint2020arXiv

Deep Learning on Image Denoising: An overview

Deep learning techniques have received much attention in the area of image denoising. However, there are substantial differences in the various types of deep learning methods dealing with image denoising. Specifically, discriminative learning based on deep learning can ably address the issue of Gaussian noise. Optimization models based on deep learning are effective in estimating the real noise. However, there has thus far been little related research to summarize the different deep learning techniques for image denoising. In this paper, we offer a comparative study of deep techniques in image denoising. We first classify the deep convolutional neural networks (CNNs) for additive white noisy images; the deep CNNs for real noisy images; the deep CNNs for blind denoising and the deep CNNs for hybrid noisy images, which represents the combination of noisy, blurred and low-resolution images. Then, we analyze the motivations and principles of the different types of deep learning methods. Next, we compare the state-of-the-art methods on public denoising datasets in terms of quantitative and qualitative analysis. Finally, we point out some potential challenges and directions of future research.

preprint2020arXiv

Designing and Training of A Dual CNN for Image Denoising

Deep convolutional neural networks (CNNs) for image denoising have recently attracted increasing research interest. However, plain networks cannot recover fine details for a complex task, such as real noisy images. In this paper, we propsoed a Dual denoising Network (DudeNet) to recover a clean image. Specifically, DudeNet consists of four modules: a feature extraction block, an enhancement block, a compression block, and a reconstruction block. The feature extraction block with a sparse machanism extracts global and local features via two sub-networks. The enhancement block gathers and fuses the global and local features to provide complementary information for the latter network. The compression block refines the extracted information and compresses the network. Finally, the reconstruction block is utilized to reconstruct a denoised image. The DudeNet has the following advantages: (1) The dual networks with a parse mechanism can extract complementary features to enhance the generalized ability of denoiser. (2) Fusing global and local features can extract salient features to recover fine details for complex noisy images. (3) A Small-size filter is used to reduce the complexity of denoiser. Extensive experiments demonstrate the superiority of DudeNet over existing current state-of-the-art denoising methods.

preprint2020arXiv

Distortionless Multi-Channel Target Speech Enhancement for Overlapped Speech Recognition

Speech enhancement techniques based on deep learning have brought significant improvement on speech quality and intelligibility. Nevertheless, a large gain in speech quality measured by objective metrics, such as perceptual evaluation of speech quality (PESQ), does not necessarily lead to improved speech recognition performance due to speech distortion in the enhancement stage. In this paper, a multi-channel dilated convolutional network based frequency domain modeling is presented to enhance target speaker in the far-field, noisy and multi-talker conditions. We study three approaches towards distortionless waveforms for overlapped speech recognition: estimating complex ideal ratio mask with an infinite range, incorporating the fbank loss in a multi-objective learning and finetuning the enhancement model by an acoustic model. Experimental results proved the effectiveness of all three approaches on reducing speech distortions and improving recognition accuracy. Particularly, the jointly tuned enhancement model works very well with other standalone acoustic model on real test data.

preprint2020arXiv

Electronic states and magnetic response of MnBi2Te4 by scanning tunneling microscopy and spectroscopy

Exotic quantum phenomena have been demonstrated in recently discovered intrinsic magnetic topological insulator MnBi2Te4. At its two-dimensional limit, quantum anomalous Hall (QAH) effect and axion insulator state are observed in odd and even layers of MnBi2Te4, respectively. The measured band structures exhibit intriguing and complex properties. Here we employ low-temperature scanning tunneling microscopy to study its surface states and magnetic response. The quasiparticle interference patterns indicate that the electronic structures on the topmost layer of MnBi2Te4 is different from that of the expected out-of-plane A-type antiferromagnetic phase. The topological surface states may be embedded in deeper layers beneath the topmost surface. Such novel electronic structure presumably related to the modification of crystalline structure during sample cleaving and re-orientation of magnetic moment of Mn atoms near the surface. Mn dopants substituted at the Bi site on the second atomic layer are observed. The ratio of Mn/Bi substitutions is 5%. The electronic structures are fluctuating at atomic scale on the surface, which can affect the magnetism of MnBi2Te4. Our findings shed new lights on the magnetic property of MnBi2Te4 and thus the design of magnetic topological insulators.

preprint2020arXiv

Enhancement of superconductivity in organic-inorganic hybrid topological materials

Inducing or enhancing superconductivity in topological materials is an important route toward topological superconductivity. Reducing the thickness of transition metal dichalcogenides (e.g. WTe2 and MoTe2) has provided an important pathway to engineer superconductivity in topological matters; for instance, emergent superconductivity with Tc=0.82 K was observed in monolayer WTe2 which also hosts intriguing quantum spin Hall effect, although the bulk crystal is nonsuperconducting. However, such monolayer sample is difficult to obtain, unstable in air, and with extremely low Tc, which could pose a grand challenge for practical applications. Here we report an experimentally convenient approach to control the interlayer coupling to achieve tailored topological properties, enhanced superconductivity and good sample stability through organic cation intercalation of the Weyl semimetals MoTe2 and WTe2. The as-formed organic-inorganic hybrid crystals are weak topological insulators with enhanced Tc of 7.0 K for intercalated MoTe2 (0.25 K for pristine crystal) and 2.3 K for intercalated WTe2 (2.8 times compared to monolayer WTe2). Such organic-cationintercalation method can be readily applied to many other layered crystals, providing a new pathway for manipulating their electronic, topological and superconducting properties.

preprint2020arXiv

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

Hand-crafted spatial features (e.g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods. However, these manually designed spatial features are hard to incorporate into the end-to-end optimized MCSS framework. In this work, we propose an integrated architecture for learning spatial features directly from the multi-channel speech waveforms within an end-to-end speech separation framework. In this architecture, time-domain filters spanning signal channels are trained to perform adaptive spatial filtering. These filters are implemented by a 2d convolution (conv2d) layer and their parameters are optimized using a speech separation objective function in a purely data-driven fashion. Furthermore, inspired by the IPD formulation, we design a conv2d kernel to compute the inter-channel convolution differences (ICDs), which are expected to provide the spatial cues that help to distinguish the directional sources. Evaluation results on simulated multi-channel reverberant WSJ0 2-mix dataset demonstrate that our proposed ICD based MCSS model improves the overall signal-to-distortion ratio by 10.4% over the IPD based MCSS model.

preprint2020arXiv

Eulerian-Lagrangian modelling of detonative combustion in two-phase gas-droplet mixtures with OpenFOAM: validations and verifications

A hybrid Eulerian-Lagrangian solver RYrhoCentralFoam is developed based on OpenFOAM to simulate detonative combustion in two-phase gas-liquid mixtures. For Eulerian gas phase, RYrhoCentralFoam enjoys second order of accuracy in time and space discretizations and is based on finite volume method on polyhedral cells. The following developments are made based on the standard compressible flow solver rhoCentralFoam in OpenFOAM: (1) multi-component species transport, (2) detailed fuel chemistry for gas phase combustion, and (3) Lagrangian solver for gas-droplet two-phase flows and sub-models for liquid droplets. To extensively verify and validate the developments and implementations of the solver and models, a series of benchmark cases are studied, including non-reacting multi-component gaseous flows, purely gaseous detonations, and two-phase gas-droplet mixtures. The results show that the RYrhoCentralFoam solver can accurately predict the flow discontinuities (e.g. shock wave and expansion wave), molecular diffusion, auto-ignition and shock-induced ignition. Also, the RYrhoCentralFoam solver can accurately simulate gaseous detonation propagation for different fuels (e.g. hydrogen and methane), about propagation speed, detonation frontal structures and cell size. Sub-models related to the droplet phase are verified and/or validated against analytical and experimental data. It is also found that the RYrhoCentralFoam solver is able to capture the main quantities and features of the gas-droplet two-phase detonations, including detonation propagation speed, interphase interactions and detonation frontal structures. As our future work, RYrhoCentralFoam solver can also be extended for simulating two-phase detonations in dense droplet sprays.

preprint2020arXiv

High-Chern-Number and High-Temperature Quantum Hall Effect without Landau Levels

The quantum Hall effect (QHE) with quantized Hall resistance of h/νe2 starts the research on topological quantum states and lays the foundation of topology in physics. Afterwards, Haldane proposed the QHE without Landau levels, showing nonzero Chern number |C|=1, which has been experimentally observed at relatively low temperatures. For emerging physics and low-power-consumption electronics, the key issues are how to increase the working temperature and realize high Chern numbers (C>1). Here, we report the experimental discovery of high-Chern-number QHE (C=2) without Landau levels and C=1 Chern insulator state displaying nearly quantized Hall resistance plateau above the Néel temperature in MnBi2Te4 devices. Our observations provide a new perspective on topological matter and open new avenues for exploration of exotic topological quantum states and topological phase transitions at higher temperatures.

preprint2020arXiv

High-Temperature Quantum Anomalous Hall Insulators in Lithium-Decorated Iron-Based Superconductor Materials

Quantum anomalous Hall (QAH) insulator is the key material to study emergent topological quantum effects, but its ultralow working temperature limits experiments. Here, by first-principles calculations, we find a family of stable two-dimensional (2D) structures generated by lithium decoration of layered iron-based superconductor materials FeX (X = S, Se, Te), and predict room-temperature ferromagnetic semiconductors together with large-gap high-Chern-number QAH insulators in the 2D materials. The extremely robust ferromagnetic order is induced by the electron injection from Li to Fe and stabilized by strong ferromagnetic kinetic exchange in the 2D Fe layer. While in the absence of spin-orbit coupling (SOC), the ferromagnetism polarizes the system into a half Dirac semimetal state protected by mirror symmetry, the SOC effect results in a spontaneous breaking of mirror symmetry and introduces a Dirac mass term, which creates QAH states with sizable gaps (several tens of meV) and multiple chiral edge modes. We also find a 3D QAH insulator phase featured by macroscopic number of chiral conduction channels in bulk LiOH-LiFeX. The findings open new opportunities to realize novel QAH physics and applications at high temperatures.

preprint2020arXiv

Higher-order topological insulators and semimetals in generalized Aubry-André-Harper models

Higher-order topological phases of matter have been extensively studied in various areas of physics. While the Aubry-André-Harper model provides a paradigmatic example to study topological phases, it has not been explored whether a generalized Aubry-André-Harper model can exhibit a higher-order topological phenomenon. Here, we construct a two-dimensional higher-order topological insulator with chiral symmetry based on the Aubry-André-Harper model. We find the coexistence of zero-energy and nonzero energy corner-localized modes. The former is protected by the quantized quadrupole moment, while the latter by the first Chern number of the Wannier band. The nonzero-energy mode can also be viewed as the consequence of a Chern insulator localized on a surface. More interestingly, the non-zero energy corner mode can lie in the continuum of extended bulk states and form a bound state in the continuum of higher-order topological systems. We finally propose an experimental scheme to realize our model in electric circuits. Our study opens a door to further study higher-order topological phases based on the Aubry-André-Harper model.

preprint2020arXiv

LaSOT: A High-quality Large-scale Single Object Tracking Benchmark

Despite great recent advances in visual tracking, its further development, including both algorithm design and evaluation, is limited due to lack of dedicated large-scale benchmarks. To address this problem, we present LaSOT, a high-quality Large-scale Single Object Tracking benchmark. LaSOT contains a diverse selection of 85 object classes, and offers 1,550 totaling more than 3.87 million frames. Each video frame is carefully and manually annotated with a bounding box. This makes LaSOT, to our knowledge, the largest densely annotated tracking benchmark. Our goal in releasing LaSOT is to provide a dedicated high quality platform for both training and evaluation of trackers. The average video length of LaSOT is around 2,500 frames, where each video contains various challenge factors that exist in real world video footage,such as the targets disappearing and re-appearing. These longer video lengths allow for the assessment of long-term trackers. To take advantage of the close connection between visual appearance and natural language, we provide language specification for each video in LaSOT. We believe such additions will allow for future research to use linguistic features to improve tracking. Two protocols, full-overlap and one-shot, are designated for flexible assessment of trackers. We extensively evaluate 48 baseline trackers on LaSOT with in-depth analysis, and results reveal that there still exists significant room for improvement. The complete benchmark, tracking results as well as analysis are available at http://vision.cs.stonybrook.edu/~lasot/.

preprint2020arXiv

Lightweight image super-resolution with enhanced CNN

Deep convolutional neural networks (CNNs) with strong expressive ability have achieved impressive performances on single image super-resolution (SISR). However, their excessive amounts of convolutions and parameters usually consume high computational cost and more memory storage for training a SR model, which limits their applications to SR with resource-constrained devices in real world. To resolve these problems, we propose a lightweight enhanced SR CNN (LESRCNN) with three successive sub-blocks, an information extraction and enhancement block (IEEB), a reconstruction block (RB) and an information refinement block (IRB). Specifically, the IEEB extracts hierarchical low-resolution (LR) features and aggregates the obtained features step-by-step to increase the memory ability of the shallow layers on deep layers for SISR. To remove redundant information obtained, a heterogeneous architecture is adopted in the IEEB. After that, the RB converts low-frequency features into high-frequency features by fusing global and local features, which is complementary with the IEEB in tackling the long-term dependency problem. Finally, the IRB uses coarse high-frequency features from the RB to learn more accurate SR features and construct a SR image. The proposed LESRCNN can obtain a high-quality image by a model for different scales. Extensive experiments demonstrate that the proposed LESRCNN outperforms state-of-the-arts on SISR in terms of qualitative and quantitative evaluation. The code of LESRCNN is accessible on https://github.com/hellloxiaotian/LESRCNN.

preprint2020arXiv

Neural Spatio-Temporal Beamformer for Target Speech Separation

Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear speech distortions that are harmful for the automatic speech recognition (ASR). On the other hand, the minimum variance distortionless response (MVDR) beamformer with NN-predicted masks, although can significantly reduce speech distortions, has limited noise reduction capability. In this paper, we propose a multi-tap MVDR beamformer with complex-valued masks for speech separation and enhancement. Compared to the state-of-the-art NN-mask based MVDR beamformer, the multi-tap MVDR beamformer exploits the inter-frame correlation in addition to the inter-microphone correlation that is already utilized in prior arts. Further improvements include the replacement of the real-valued masks with the complex-valued masks and the joint training of the complex-mask NN. The evaluation on our multi-modal multi-channel target speech separation and enhancement platform demonstrates that our proposed multi-tap MVDR beamformer improves both the ASR accuracy and the perceptual speech quality against prior arts.

preprint2020arXiv

Recurrent Exposure Generation for Low-Light Face Detection

Face detection from low-light images is challenging due to limited photos and inevitable noise, which, to make the task even harder, are often spatially unevenly distributed. A natural solution is to borrow the idea from multi-exposure, which captures multiple shots to obtain well-exposed images under challenging conditions. High-quality implementation/approximation of multi-exposure from a single image is however nontrivial. Fortunately, as shown in this paper, neither is such high-quality necessary since our task is face detection rather than image enhancement. Specifically, we propose a novel Recurrent Exposure Generation (REG) module and couple it seamlessly with a Multi-Exposure Detection (MED) module, and thus significantly improve face detection performance by effectively inhibiting non-uniform illumination and noise issues. REG produces progressively and efficiently intermediate images corresponding to various exposure settings, and such pseudo-exposures are then fused by MED to detect faces across different lighting conditions. The proposed method, named REGDet, is the first `detection-with-enhancement&#39; framework for low-light face detection. It not only encourages rich interaction and feature fusion across different illumination levels, but also enables effective end-to-end learning of the REG component to be better tailored for face detection. Moreover, as clearly shown in our experiments, REG can be flexibly coupled with different face detectors without extra low/normal-light image pairs for training. We tested REGDet on the DARK FACE low-light face benchmark with thorough ablation study, where REGDet outperforms previous state-of-the-arts by a significant margin, with only negligible extra parameters.

preprint2020arXiv

Robust axion insulator and Chern insulator phases in a two-dimensional antiferromagnetic topological insulator

The intricate interplay between nontrivial topology and magnetism in two-dimensional (2D) materials has led to the emergence of many novel phenomena and functionalities. An outstanding example is the quantum anomalous Hall (QAH) effect, which was realized in magnetically doped topological insulators (TIs) in the absence of magnetic field. Recently, the layered van der Waals compound MnBi2Te4 has been theoretically predicted and experimentally verified to be a TI with interlayer antiferromagnetic (AFM) order. It is a rare stoichiometric material with coexisting topology and magnetism, thus represents a perfect building block for complex topological-magnetic structures. Here we investigate the quantum transport behaviors of both bulk crystal and exfoliated MnBi2Te4 flakes in a field effect transistor geometry. In the 6 septuple layers (SLs) device tuned into the insulating regime, we observe a large longitudinal resistance and zero Hall plateau, which are characteristic of the axion insulator state. The robust axion insulator state occurs in zero magnetic field, over a wide magnetic field range, and at relatively high temperatures. Moreover, a moderate magnetic field drives a quantum phase transition from the axion insulator phase to a Chern insulator phase with zero longitudinal resistance and quantized Hall resistance h/e2 (h is the Plank constant and e is the elemental charge). These results pave the road for using even-number-SL MnBi2Te4 to realize the quantized topological magnetoelectric effect and axion electrodynamics in condensed matter systems.

preprint2020arXiv

Self-supervised learning for audio-visual speaker diarization

Speaker diarization, which is to find the speech segments of specific speakers, has been widely used in human-centered applications such as video conferences or human-computer interaction systems. In this paper, we propose a self-supervised audio-video synchronization learning method to address the problem of speaker diarization without massive labeling effort. We improve the previous approaches by introducing two new loss functions: the dynamic triplet loss and the multinomial loss. We test them on a real-world human-computer interaction system and the results show our best model yields a remarkable gain of +8%F1-scoresas well as diarization error rate reduction. Finally, we introduce a new large scale audio-video corpus designed to fill the vacancy of audio-video datasets in Chinese.

preprint2020arXiv

Sound Event Detection of Weakly Labelled Data with CNN-Transformer and Automatic Threshold Optimization

Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of the SED task is that many datasets such as the Detection and Classification of Acoustic Scenes and Events (DCASE) datasets are weakly labelled. That is, there are only audio tags for each audio clip without the onset and offset times of sound events. \qk{We compare segment-wise and clip-wise training for SED that is lacking in previous works. We propose a convolutional neural network transformer (CNN-Transfomer) for audio tagging and SED, and show that CNN-Transformer performs similarly to a convolutional recurrent neural network (CRNN)}. Another challenge of SED is that thresholds are required for detecting sound events. Previous works set thresholds empirically, and are not an optimal approaches. To solve this problem, we propose an automatic threshold optimization method. The first stage is to optimize the system with respect to metrics that do not depend on thresholds, such as mean average precision (mAP). The second stage is to optimize the thresholds with respect to metrics that depends on those thresholds. Our proposed automatic threshold optimization system achieves a state-of-the-art audio tagging F1 of 0.646, outperforming that without threshold optimization of 0.629, and a sound event detection F1 of 0.584, outperforming that without threshold optimization of 0.564.

preprint2020arXiv

Strong averaging principles for a class of non-autonomous slow-fast systems of SPDEs with polynomial growth

In this work, we study a class of non-autonomous two-time-scale stochastic reaction-diffusion equations driven by Poisson random measures, in which the coefficients satisfy the polynomial growth condition and local Lipschitz condition. First, the existence and uniqueness of the mild solution are proved by constructing auxiliary equations and using the technique of stopping time. Then, consider that the time dependent of the coefficients, the averaged equation is redefined by studying the existence of time-dependent evolution family of measures associated with the frozen fast equation. Further, the slow component strongly converges to the solution of the corresponding averaged equation is verified by using the classical Khasminskii method.

preprint2020arXiv

Topological Insulators beyond Energy Band Characterization

Topological phases of matter are generally characterized by topological properties of energy bands of a system. Their transitions under preserved symmetries occur through closing a gap of energy bands, leading to topologically protected edge states in energy spectra in topological phases. Here we predict a new topological phase that emerges through closing a gap of bands constructed by energy bands, instead of through closing an energy gap with preserved symmetries. From this perspective, topological phases may arise from topological properties of the &#34;bands of bands&#34; associated with their gap closure and corresponding edge states. We demonstrate this idea by studying a tight-binding model. We find that the Wannier bands constructed by energy bands exhibit a gap closure associated with a change of a winding number, while the energy bands remain gapped and trivial without any zero energy modes. In addition, the topological Wannier bands give rise to quantized edge polarizations. Since the emergence of this topological phase does not involve any energy gap closure, we expect its appearance under unitary time evolution. Indeed, this phase appears as we perform a quench dynamics. Our study opens a new direction for exploring topological phases beyond conventional energy band characterization.

preprint2020arXiv

Tunable interlayer magnetism and band topology in van der Waals heterostructures of MnBi2Te4-family materials

Manipulating the interlayer magnetic coupling in van der Waals magnetic materials and heterostructures is the key to tailoring their magnetic and electronic properties for various electronic applications and fundamental studies in condensed matter physics. By utilizing the MnBi2Te4-family compounds and their heterostructures as a model system, we systematically studied the dependence of the sign and strength of interlayer magnetic coupling on constituent elements by using first-principles calculations. It was found that the coupling is a long-range superexchange interaction mediated by the chains of p orbitals between the magnetic atoms of neighboring septuple-layers. The interlayer exchange is always antiferromagnetic in the pure compounds, but can be tuned to ferromagnetic in some combinations of heterostructures, dictated by d orbital occupations. Strong interlayer magnetic coupling can be realized if the medial p electrons are delocalized and the d bands of magnetic atoms are near the Fermi level. The knowledge on the interlayer coupling mechanism enables us to engineer magnetic and topological properties of MnBi2Te4-family materials as well as many other insulating van der Waals magnetic materials and heterostructures.

preprint2020arXiv

Type-II Ising superconductivity and anomalous metallic state in macro-size ambient-stable ultrathin crystalline films

Recent emergence of two-dimensional (2D) crystalline superconductors has provided a promising platform to investigate novel quantum physics and potential applications. To reveal essential quantum phenomena therein, ultralow temperature transport investigation on high quality ultrathin superconducting films is critically required, although it has been quite challenging experimentally. Here we report a systematic transport study on the ultrathin crystalline PdTe2 films grown by molecular beam epitaxy (MBE). Interestingly, a new type of Ising superconductivity in 2D centrosymmetric materials is revealed by the detection of large in-plane critical field more than 7 times Pauli limit. Remarkably, in perpendicular magnetic field, we provide solid evidence of anomalous metallic state characterized by the resistance saturation at low temperatures with high quality filters. The robust superconductivity with intriguing quantum phenomena in the macro-size ambient-stable ultrathin PdTe2 films remains almost the same for 20 months, showing great potentials in electronic and spintronic applications.

preprint2020arXiv

Type-II quadrupole topological insulators

Modern theory of electric polarization is formulated by the Berry phase, which, when quantized, leads to topological phases of matter. Such a formulation has recently been extended to higher electric multipole moments, through the discovery of the so-called quadupole topological insulator. It has been established by a classical electromagnetic theory that in a two-dimensional material the quantized properties for the quadupole topological insulator should satisfy a basic relation. Here we discover a new type of quadrupole topological insulator (dubbed type-II) that violates this relation due to the breakdown of the correspondence that a Wannier band and an edge energy spectrum close their gaps simultaneously. We find that, similar to the previously discovered (referred to as type-I) quadrupole topological insulator, the type-II hosts topologically protected corner states carrying fractional corner charges. However, the edge polarizations only occur at a pair of boundaries in the type-II insulating phase, leading to the violation of the classical constraint. We demonstrate that such new topological phenomena can appear from quench dynamics in non-equilibrium systems, which can be experimentally observed in ultracold atomic gases. We also propose an experimental scheme with electric circuits to realize such a new topological phase of matter. The existence of the new topological insulating phase means that new multipole topological insulators with distinct properties can exist in broader contexts beyond classical constraints.

preprint2020arXiv

Winding Numbers and Generalized Mobility Edges in Non-Hermitian Systems

The Aubry-André-Harper (AAH) model with a self-dual symmetry plays an important role in studying the Anderson localization. Here we find a self-dual symmetry determining the quantum phase transition between extended and localized states in a non-Hermitian AAH model and show that the eigenenergies of these states are characterized by two types of winding numbers. By constructing and studying a non-Hermitian generalized AAH model, we further generalize the notion of the mobility edge, which separates the localized and extended states in the energy spectrum of disordered systems, to the non-Hermitian case and find that the generalized mobility edge is of a topological nature even in the open boundary geometry in the sense that the energies of localized and extended states exhibit distinct topological structures in the complex energy plane. Finally, we propose an experimental scheme to realize these models with electric circuits.

preprint2019arXiv

Berry Curvature Engineering by Gating Two-Dimensional Antiferromagnets

Recent advances in tuning electronic, magnetic, and topological properties of two-dimensional (2D) magnets have opened a new frontier in the study of quantum physics and promised exciting possibilities for future quantum technologies. In this study, we find that the dual-gate technology can well tune the electronic and topological properties of antiferromagnetic (AFM) even septuple-layer (SL) MnBi$_2$Te$_4$ thin films. Under an out-of-plane electric field that breaks $\mathcal{PT}$ symmetry, the Berry curvature of the thin film could be engineered efficiently, resulting in a huge change of anomalous Hall (AH) signal. Beyond the critical electric field, the double-SL MnBi$_2$Te$_4$ thin film becomes a Chern insulator with a high Chern number of 3. We further demonstrate that such 2D material can be used as an AFM switch via electric-field control of the AH signal. These discoveries inspire the design of low-power memory prototype for future AFM spintronic applications.

preprint2019arXiv

Solving Fokker-Planck equation using deep learning

The probability density function of stochastic differential equations is governed by the Fokker-Planck (FP) equation. A novel machine learning method is developed to solve the general FP equations based on deep neural networks. The proposed algorithm does not require any interpolation and coordinate transformation, which is different from the traditional numercial methods. The main novelty of this paper is that penalty factors are introduced to overcome the local optimization for the deep learning approach, and the corresponding setting rules are given. Meanwhile, we consider a normalization condition as a supervision condition to effectively avoid that the trial solution is zero. Several numerical examples are presented to illustrate performances of the proposed algorithm, including one- and two-dimensional systems. All the results suggest that the deep learning is quite feasible and effective to calculate the FP equation. Further, influences of the number of hidden layers, the penalty factors, and the optimization algorithm are discussed in detail. These results indicate that the performances of the machine learning technique can be improved through constructing the neural networks appropriately.

preprint2019arXiv

Topological Phases in Non-Hermitian Aubry-André-Harper Models

Topological phases have recently witnessed a rapid progress in non-Hermitian systems. Here we study a one-dimensional non-Hermitian Aubry-André-Harper model with imaginary periodic or quasiperiodic modulations. We demonstrate that the non-Hermitian off-diagonal AAH models can host zero-energy modes at the edges. In contrast to the Hermitian case, the zero-energy mode can be localized only at one edge. Such a topological phase corresponds to the existence of a quarter winding number defined by eigenenergy in momentum space. We further find the coexistence of a zero-energy mode located only at one edge and topological nonzero energy edge modes characterized by a generalized Bott index. In the incommensurate case, a topological non-Hermitian quasicrystal is predicted where all bulk states and two topological edge states are localized at one edge. Such topological edge modes are protected by the generalized Bott index. Finally, we propose an experimental scheme to realize these non-Hermitian models in electric circuits. Our findings add a new direction for exploring topological properties in Aubry-André-Harper models.

preprint2019arXiv

Type-II Ising Pairing in Few-Layer Stanene

Spin-orbit coupling has proven indispensable in realizing topological materials and more recently Ising pairing in two-dimensional superconductors. This pairing mechanism relies on inversion symmetry breaking and sustains anomalously large in-plane polarizing magnetic fields whose upper limit is expected to diverge at low temperatures, although experimental demonstration of this has remained elusive due to the required fields. In this work, the recently discovered superconductor few-layer stanene, i.e. epitaxially strained $α$-Sn, is shown to exhibit a new type of Ising pairing between carriers residing in bands with different orbital indices near the $Γ$-point. The bands are split as a result of spin-orbit locking without the participation of inversion symmetry breaking. The in-plane upper critical field is strongly enhanced at ultra-low temperature and reveals the sought for upturn.