Source author record

Yuxuan Wang

Yuxuan Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

61works

31topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots

Inferring cellular trajectories from destructive snapshots is complicated by the challenges of stochasticity and non-conservative mass dynamics such as cell proliferation and apoptosis. Existing unbalanced Optimal Transport (OT) methods treat mass as a continuous fluid, performing inference at the population level. However, this macroscopic view often fails to capture the discrete, jump-like nature of birth-death events at single-cell resolution, which is essential for understanding lineage branching and fate decisions. We present Unbalanced Schrödinger Bridge (USB), a simulation-free framework for learning underlying dynamics that effectively integrates both stochastic and unbalanced effects which also models the discrete, jump-like birth-death dynamics at single-cell resolution. Theoretically, USB provides a tractable solution to the Branching Schrödinger Bridge (BSB) problem, offering a rigorous microscopic interpretation where individual cells undergo both Brownian motion and discrete birth-death jumps. Technically, the method implements an efficient solver by introducing a simulation-free training objective that effectively scales to high-dimensional omics data. Empirically, we demonstrate on both simulated and real-world datasets that USB not only achieves trajectory reconstruction performance better than or comparable to deterministic baselines but also uniquely enables realistic discrete simulation of birth-death dynamics at single-cell resolution.

preprint2026arXiv

Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents

Current embodied agents are often limited to passive instruction-following or reactive need-satisfaction, lacking a stable, high-order value framework essential for long-term, self-directed behavior and resolving motivational conflicts. We introduce \textit{ValuePlanner}, a hierarchical cognitive architecture that decouples high-level value scheduling from low-level action execution. \textit{ValuePlanner} employs an LLM-based cognitive module to generate symbolic subgoals by reasoning through abstract value trade-offs, which are then translated into executable action plans by a classical PDDL planner. This process is refined via a closed-loop feedback mechanism. Evaluating such autonomy requires methods beyond task-success rates, and we therefore propose a value-centric evaluation suite measuring cumulative value gain, preference alignment, and behavioral diversity. Experiments in the TongSim household environment demonstrate that \textit{ValuePlanner} arbitrates competing values to generate coherent, long-horizon, self-directed behavior absent from instruction-following and needs-driven baselines. Our work offers a structured approach to bridging intrinsic values and grounded behavior for autonomous agents.

preprint2026arXiv

Byzantine-Robust Distributed Sparse Learning Revisited

We revisit Byzantine robust distributed estimation for high-dimensional sparse linear models. By combining local $\ell_1$-regularized robust estimation with robust aggregation at the server, the framework applies to pseudo-Huber regression, quantile regression, and sparse SVM. We show that the resulting estimators yield non-asymptotic guarantees and attain near-optimal statistical rates under mild conditions, while remaining communication-efficient. Simulations confirm strong robustness in estimation, support recovery and classification accuracy under various Byzantine attacks.

preprint2026arXiv

How vehicles change lanes after encountering crashes: Empirical analysis and modeling

When a traffic crash occurs, following vehicles need to change lanes to bypass the obstruction. We define these maneuvers as post crash lane changes. In such scenarios, vehicles in the target lane may refuse to yield even after the lane change has already begun, increasing the complexity and crash risk of post crash LCs. However, the behavioral characteristics and motion patterns of post crash LCs remain unknown. To address this gap, we construct a post crash LC dataset by extracting vehicle trajectories from drone videos captured after crashes. Our empirical analysis reveals that, compared to mandatory LCs (MLCs) and discretionary LCs (DLCs), post crash LCs exhibit longer durations, lower insertion speeds, and higher crash risks. Notably, 79.4% of post crash LCs involve at least one instance of non yielding behavior from the new follower, compared to 21.7% for DLCs and 28.6% for MLCs. Building on these findings, we develop a novel trajectory prediction framework for post crash LCs. At its core is a graph based attention module that explicitly models yielding behavior as an auxiliary interaction aware task. This module is designed to guide both a conditional variational autoencoder and a Transformer based decoder to predict the lane changer's trajectory. By incorporating the interaction aware module, our model outperforms existing baselines in trajectory prediction performance by more than 10% in both average displacement error and final displacement error across different prediction horizons. Moreover, our model provides more reliable crash risk analysis by reducing false crash rates and improving conflict prediction accuracy. Finally, we validate the model's transferability using additional post crash LC datasets collected from different sites.

preprint2026arXiv

The AI Hippocampus: How Far are We From Human Memory?

Memory plays a foundational role in augmenting the reasoning, adaptability, and contextual fidelity of modern Large Language Models and Multi-Modal LLMs. As these models transition from static predictors to interactive systems capable of continual learning and personalized inference, the incorporation of memory mechanisms has emerged as a central theme in their architectural and functional evolution. This survey presents a comprehensive and structured synthesis of memory in LLMs and MLLMs, organizing the literature into a cohesive taxonomy comprising implicit, explicit, and agentic memory paradigms. Specifically, the survey delineates three primary memory frameworks. Implicit memory refers to the knowledge embedded within the internal parameters of pre-trained transformers, encompassing their capacity for memorization, associative retrieval, and contextual reasoning. Recent work has explored methods to interpret, manipulate, and reconfigure this latent memory. Explicit memory involves external storage and retrieval components designed to augment model outputs with dynamic, queryable knowledge representations, such as textual corpora, dense vectors, and graph-based structures, thereby enabling scalable and updatable interaction with information sources. Agentic memory introduces persistent, temporally extended memory structures within autonomous agents, facilitating long-term planning, self-consistency, and collaborative behavior in multi-agent systems, with relevance to embodied and interactive AI. Extending beyond text, the survey examines the integration of memory within multi-modal settings, where coherence across vision, language, audio, and action modalities is essential. Key architectural advances, benchmark tasks, and open challenges are discussed, including issues related to memory capacity, alignment, factual consistency, and cross-system interoperability.

preprint2026arXiv

Variable Basis Mapping for Real-Time Volumetric Visualization

Real-time visualization of large-scale volumetric data remains challenging, as direct volume rendering and voxel-based methods suffer from prohibitively high computational cost. We propose Variable Basis Mapping (VBM), a framework that transforms volumetric fields into 3D Gaussian Splatting (3DGS) representations through wavelet-domain analysis. First, we precompute a compact Wavelet-to-Gaussian Transition Bank that provides optimal Gaussian surrogates for canonical wavelet atoms across multiple scales. Second, we perform analytical Gaussian construction that maps discrete wavelet coefficients directly to 3DGS parameters using a closed-form, mathematically principled rule. Finally, a lightweight image-space fine-tuning stage further refines the representation to improve rendering fidelity. Experiments on diverse datasets demonstrate that VBM significantly accelerates convergence and enhances rendering quality, enabling real-time volumetric visualization.

preprint2025arXiv

BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts

Strategic dialogue requires agents to execute distinct dialogue acts, for which belief estimation is essential. While prior work often estimates beliefs accurately, it lacks a principled mechanism to use those beliefs during generation. We bridge this gap by first formalizing two core acts Adversarial and Alignment, and by operationalizing them via probabilistic constraints on what an agent may generate. We instantiate this idea in BEDA, a framework that consists of the world set, the belief estimator for belief estimation, and the conditional generator that selects acts and realizes utterances consistent with the inferred beliefs. Across three settings, Conditional Keeper Burglar (CKBG, adversarial), Mutual Friends (MF, cooperative), and CaSiNo (negotiation), BEDA consistently outperforms strong baselines: on CKBG it improves success rate by at least 5.0 points across backbones and by 20.6 points with GPT-4.1-nano; on Mutual Friends it achieves an average improvement of 9.3 points; and on CaSiNo it achieves the optimal deal relative to all baselines. These results indicate that casting belief estimation as constraints provides a simple, general mechanism for reliable strategic dialogue.

preprint2025arXiv

Vestigial $d$-wave charge-$4e$ Superconductivity from Bidirectional Pair Density Waves

We analyze the leading vestigial instability due to the melting of a bidirectional pair-density-wave state in two dimensions. In a previous work by one of the authors, it was found that the interplay between pair-density-wave fluctuations with ordering momenta along the $x$ and $y$ directions can provide a strong attractive interaction for charge-$4e$ superconductivity in the $d$-wave channel. In this work, we go beyond the artificial large-$M$ mean-field theory previously adopted and compute the phase diagram by incorporating phase fluctuations of the pair-density-wave order parameters. By investigating the relevance of various topological defects, we show that the interaction in the $d$-wave channel, together with the strong anisotropy of phase fluctuations around the pair-density-wave ordering momenta, favors a vestigial charge-$4e$ superconducting order at intermediate temperatures. By contrast, a competing charge-density-wave vestigial order does not develop, due to the suppression of its stiffness.

preprint2024arXiv

Many-body higher-order topological invariant for $C_n$-symmetric insulators

Higher-order topological insulators in two spatial dimensions display fractional corner charges. While fractional charges in one dimension are known to be captured by a many-body bulk invariant, computed by the Resta formula, a many-body bulk invariant for higher-order topology and the corresponding fractional corner charges remains elusive despite several attempts. Inspired by recent work by Tada and Oshikawa, we propose a well-defined many-body bulk invariant for $C_n$ symmetric higher-order topological insulators, which is valid for both non-interacting and interacting systems. Instead of relating them to the bulk quadrupole moment as was previously done, we show that in the presence of $C_n$ rotational symmetry, this bulk invariant can be directly identified with quantized fractional corner charges. In particular, we prove that the corner charge is quantized as $e/n$ with $C_n$ symmetry, leading to a $\mathbb{Z}_n$ classification for higher-order topological insulators in two dimensions.

preprint2024arXiv

STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering

Recently we have witnessed the rapid development of video question answering models. However, most models can only handle simple videos in terms of temporal reasoning, and their performance tends to drop when answering temporal-reasoning questions on long and informative videos. To tackle this problem we propose STAIR, a Spatial-Temporal Reasoning model with Auditable Intermediate Results for video question answering. STAIR is a neural module network, which contains a program generator to decompose a given question into a hierarchical combination of several sub-tasks, and a set of lightweight neural modules to complete each of these sub-tasks. Though neural module networks are already widely studied on image-text tasks, applying them to videos is a non-trivial task, as reasoning on videos requires different abilities. In this paper, we define a set of basic video-text sub-tasks for video question answering and design a set of lightweight modules to complete them. Different from most prior works, modules of STAIR return intermediate outputs specific to their intentions instead of always returning attention maps, which makes it easier to interpret and collaborate with pre-trained models. We also introduce intermediate supervision to make these intermediate outputs more accurate. We conduct extensive experiments on several video question answering datasets under various settings to show STAIR's performance, explainability, compatibility with pre-trained models, and applicability when program annotations are not available. Code: https://github.com/yellow-binary-tree/STAIR

preprint2022arXiv

A Piecewise Monotonic Gait Phase Estimation Model for Controlling a Powered Transfemoral Prosthesis in Various Locomotion Modes

Gait phase-based control is a trending research topic for walking-aid robots, especially robotic lower-limb prostheses. Gait phase estimation is a challenge for gait phase-based control. Previous researches used the integration or the differential of the human's thigh angle to estimate the gait phase, but accumulative measurement errors and noises can affect the estimation results. In this paper, a more robust gait phase estimation method is proposed using a unified form of piecewise monotonic gait phase-thigh angle models for various locomotion modes. The gait phase is estimated from only the thigh angle, which is a stable variable and avoids phase drifting. A Kalman filter-based smoother is designed to further suppress the mutations of the estimated gait phase. Based on the proposed gait phase estimation method, a gait phase-based joint angle tracking controller is designed for a transfemoral prosthesis. The proposed gait estimation method, the gait phase smoother, and the controller are evaluated through offline analysis on walking data in various locomotion modes. And the real-time performance of the gait phase-based controller is validated in an experiment on the transfemoral prosthesis.

preprint2022arXiv

An Efficient Algorithm for the Partitioning Min-Max Weighted Matching Problem

The Partitioning Min-Max Weighted Matching (PMMWM) problem is an NP-hard problem that combines the problem of partitioning a group of vertices of a bipartite graph into disjoint subsets with limited size and the classical Min-Max Weighted Matching (MMWM) problem. Kress et al. proposed this problem in 2015 and they also provided several algorithms, among which MP$_{\text{LS}}$ is the state-of-the-art. In this work, we observe there is a time bottleneck in the matching phase of MP$_{\text{LS}}$. Hence, we optimize the redundant operations during the matching iterations, and propose an efficient algorithm called the MP$_{\text{KM-M}}$ that greatly speeds up MP$_{\text{LS}}$. The bottleneck time complexity is optimized from $O(n^3)$ to $O(n^2)$. We also prove the correctness of MP$_{\text{KM-M}}$ by the primal-dual method. To test the performance on diverse instances, we generate various types and sizes of benchmarks, and carried out an extensive computational study on the performance of MP$_{\text{KM-M}}$ and MP$_{\text{LS}}$. The evaluation results show that our MP$_{\text{KM-M}}$ greatly shortens the runtime as compared with MP$_{\text{LS}}$ while yielding the same solution quality.

preprint2022arXiv

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech

Some recent studies have demonstrated the feasibility of single-stage neural text-to-speech, which does not need to generate mel-spectrograms but generates the raw waveforms directly from the text. Single-stage text-to-speech often faces two problems: a) the one-to-many mapping problem due to multiple speech variations and b) insufficiency of high frequency reconstruction due to the lack of supervision of ground-truth acoustic features during training. To solve the a) problem and generate more expressive speech, we propose a novel phoneme-level prosody modeling method based on a variational autoencoder with normalizing flows to model underlying prosodic information in speech. We also use the prosody predictor to support end-to-end expressive speech synthesis. Furthermore, we propose the dual parallel autoencoder to introduce supervision of the ground-truth acoustic features during training to solve the b) problem enabling our model to generate high-quality speech. We compare the synthesis quality with state-of-the-art text-to-speech systems on an internal expressive English dataset. Both qualitative and quantitative evaluations demonstrate the superiority and robustness of our method for lossless speech generation while also showing a strong capability in prosody modeling.

preprint2022arXiv

GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music

Symbolic music datasets are important for music information retrieval and musical analysis. However, there is a lack of large-scale symbolic datasets for classical piano music. In this article, we create a GiantMIDI-Piano (GP) dataset containing 38,700,838 transcribed notes and 10,855 unique solo piano works composed by 2,786 composers. We extract the names of music works and the names of composers from the International Music Score Library Project (IMSLP). We search and download their corresponding audio recordings from the internet. We further create a curated subset containing 7,236 works composed by 1,787 composers by constraining the titles of downloaded audio recordings containing the surnames of composers. We apply a convolutional neural network to detect solo piano works. Then, we transcribe those solo piano recordings into Musical Instrument Digital Interface (MIDI) files using a high-resolution piano transcription system. Each transcribed MIDI file contains the onset, offset, pitch, and velocity attributes of piano notes and pedals. GiantMIDI-Piano includes 90% live performance MIDI files and 10\% sequence input MIDI files. We analyse the statistics of GiantMIDI-Piano and show pitch class, interval, trichord, and tetrachord frequencies of six composers from different eras to show that GiantMIDI-Piano can be used for musical analysis. We evaluate the quality of GiantMIDI-Piano in terms of solo piano detection F1 scores, metadata accuracy, and transcription error rates. We release the source code for acquiring the GiantMIDI-Piano dataset at https://github.com/bytedance/GiantMIDI-Piano

preprint2022arXiv

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM. The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens. With intensive experiments on Chinese and English data sets, our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate. This is achieved without impact on the decoding efficiency. Overall, we demonstrate the effectiveness of our proposed method in boosting the ASR decoding performance, especially for long-tail tokens.

preprint2022arXiv

Mixed QCD-EW corrections for Higgs leptonic decay via $HW^+W^-$ vertex

We consider the two-loop corrections to the $HW^+W^-$ vertex at order $αα_s$. We construct a canonical basis for the two-loop integrals using the Baikov representation and the intersection theory. By solving the $ε$-form differential equations, we obtain fully analytic expressions for the master integrals in terms of multiple polylogarithms, which allow fast and accurate numeric evaluation for arbitrary configurations of external momenta. We apply our analytic results to the decay process $H \to ν_e e W$, and study both the integrated and differential decay rates. Our results can also be applied to the Higgs production process via $W$ boson fusion.

preprint2022arXiv

Monte Carlo study of the pseudogap and superconductivity emerging from quantum magnetic fluctuations

The origin of the pseudogap behavior, found in many high-$T_c$ superconductors, remains one of the greatest puzzles in condensed matter physics. One possible mechanism is fermionic incoherence, which near a quantum critical point allows pair formation but suppresses superconductivity. Employing quantum Monte Carlo simulations of a model of itinerant fermions coupled to ferromagnetic spin fluctuations, represented by a quantum rotor, we report numerical evidence of pseudogap behavior, emerging from pairing fluctuations in a quantum-critical non-Fermi liquid. Specifically, we observe enhanced pairing fluctuations and a partial gap opening in the fermionic spectrum. However, the system remains non-superconducting until reaching a much lower temperature. In the pseudogap regime the system displays a "gap-filling" rather than "gap-closing" behavior, consistent with experimental observations. Our results provide the first unambiguous lattice model realization of a pseudogap state in a strongly correlated system, driven by superconducting fluctuations.

preprint2022arXiv

Network-Level Adversaries in Federated Learning

Federated learning is a popular strategy for training models on distributed, sensitive data, while preserving data privacy. Prior work identified a range of security threats on federated learning protocols that poison the data or the model. However, federated learning is a networked system where the communication between clients and server plays a critical role for the learning task performance. We highlight how communication introduces another vulnerability surface in federated learning and study the impact of network-level adversaries on training federated learning models. We show that attackers dropping the network traffic from carefully selected clients can significantly decrease model accuracy on a target population. Moreover, we show that a coordinated poisoning campaign from a few clients can amplify the dropping attacks. Finally, we develop a server-side defense which mitigates the impact of our attacks by identifying and up-sampling clients likely to positively contribute towards target accuracy. We comprehensively evaluate our attacks and defenses on three datasets, assuming encrypted communication channels and attackers with partial visibility of the network.

preprint2022arXiv

NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism

Although deep learning and end-to-end models have been widely used and shown superiority in automatic speech recognition (ASR) and text-to-speech (TTS) synthesis, state-of-the-art forced alignment (FA) models are still based on hidden Markov model (HMM). HMM has limited view of contextual information and is developed with long pipelines, leading to error accumulation and unsatisfactory performance. Inspired by the capability of attention mechanism in capturing long term contextual information and learning alignments in ASR and TTS, we propose a neural network based end-to-end forced aligner called NeuFA, in which a novel bidirectional attention mechanism plays an essential role. NeuFA integrates the alignment learning of both ASR and TTS tasks in a unified framework by learning bidirectional alignment information from a shared attention matrix in the proposed bidirectional attention mechanism. Alignments are extracted from the learnt attention weights and optimized by the ASR, TTS and FA tasks in a multi-task learning manner. Experimental results demonstrate the effectiveness of our proposed model, with mean absolute error on test set drops from 25.8 ms to 23.7 ms at word level, and from 17.0 ms to 15.7 ms at phoneme level compared with state-of-the-art HMM based model.

preprint2022arXiv

Neural Dubber: Dubbing for Videos According to Scripts

Dubbing is a post-production process of re-recording actors' dialogues, which is extensively used in filmmaking and video production. It is usually performed manually by professional voice actors who read lines with proper prosody, and in synchronization with the pre-recorded videos. In this work, we propose Neural Dubber, the first neural network model to solve a novel automatic video dubbing (AVD) task: synthesizing human speech synchronized with the given video from the text. Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech. Furthermore, an image-based speaker embedding (ISE) module is developed for the multi-speaker setting, which enables Neural Dubber to generate speech with a reasonable timbre according to the speaker's face. Experiments on the chemistry lecture single-speaker dataset and LRS2 multi-speaker dataset show that Neural Dubber can generate speech audios on par with state-of-the-art TTS models in terms of speech quality. Most importantly, both qualitative and quantitative evaluations show that Neural Dubber can control the prosody of synthesized speech by the video, and generate high-fidelity speech temporally synchronized with the video. Our project page is at https://tsinghua-mars-lab.github.io/NeuralDubber/ .

preprint2022arXiv

SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation

Adapting to a continuously evolving environment is a safety-critical challenge inevitably faced by all autonomous driving systems. Existing image and video driving datasets, however, fall short of capturing the mutable nature of the real world. In this paper, we introduce the largest multi-task synthetic dataset for autonomous driving, SHIFT. It presents discrete and continuous shifts in cloudiness, rain and fog intensity, time of day, and vehicle and pedestrian density. Featuring a comprehensive sensor suite and annotations for several mainstream perception tasks, SHIFT allows investigating the degradation of a perception system performance at increasing levels of domain shift, fostering the development of continuous adaptation strategies to mitigate this problem and assess model robustness and generality. Our dataset and benchmark toolkit are publicly available at www.vis.xyz/shift.

preprint2022arXiv

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

VQA is an ambitious task aiming to answer any image-related question. However, in reality, it is hard to build such a system once for all since the needs of users are continuously updated, and the system has to implement new functions. Thus, Continual Learning (CL) ability is a must in developing advanced VQA systems. Recently, a pioneer work split a VQA dataset into disjoint answer sets to study this topic. However, CL on VQA involves not only the expansion of label sets (new Answer sets). It is crucial to study how to answer questions when deploying VQA systems to new environments (new Visual scenes) and how to answer questions requiring new functions (new Question types). Thus, we propose CLOVE, a benchmark for Continual Learning On Visual quEstion answering, which contains scene- and function-incremental settings for the two aforementioned CL scenarios. In terms of methodology, the main difference between CL on VQA and classification is that the former additionally involves expanding and preventing forgetting of reasoning mechanisms, while the latter focusing on class representation. Thus, we propose a real-data-free replay-based method tailored for CL on VQA, named Scene Graph as Prompt for Symbolic Replay. Using a piece of scene graph as a prompt, it replays pseudo scene graphs to represent the past images, along with correlated QA pairs. A unified VQA model is also proposed to utilize the current and replayed data to enhance its QA ability. Finally, experimental results reveal challenges in CLOVE and demonstrate the effectiveness of our method. The dataset and code will be available at https://github.com/showlab/CLVQA.

preprint2022arXiv

The dynamical exponent of a quantum critical itinerant ferromagnet: a Monte Carlo study

We consider the effect of the coupling between 2D quantum rotors near an XY ferromagnetic quantum critical point and spins of itinerant fermions. We analyze how this coupling affects the dynamics of rotors and the self-energy of fermions.A common belief is that near a $q=0$ ferromagnetic transition, fermions induce an $Ω/q$ Landau damping of rotors (i.e., the dynamical critical exponent is $z=3$) and Landau overdamped rotors give rise to non-Fermi liquid fermionic self-energy $Σ\propto ω^{2/3}$. This behavior has been confirmed in previous quantum Monte Carlo (QMC) studies.Here we show that for the XY case the behavior is different.We report the results of large scale quantum Monte Carlo simulations,which show that at small frequencies $z=2$ and $Σ\propto ω^{1/2}$. We argue that the new behavior is associated with the fact that a fermionic spin is by itself not a conserved quantity due to spin-spin coupling to rotors, and a combination of self-energy and vertex corrections replaces $1/q$ in the Landau damping by a constant. We discuss the implication of these results to experiments.

preprint2022arXiv

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge. These techniques are designed to handle multi-speaker conversations in real-world meeting scenarios with high speaker-overlap ratios and under heavy reverberant and noisy condition. First, for data preparation and augmentation in training TS-VAD models, speech data containing both real meetings and simulated indoor conversations are used. Second, in refining results obtained after TS-VAD based decoding, we perform a series of post-processing steps to improve the VAD results needed to reduce diarization error rates (DERs). Tested on the ALIMEETING corpus, the newly released Mandarin meeting dataset used in M2MeT, we demonstrate that our proposed system can decrease the DER by up to 66.55/60.59% relatively when compared with classical clustering based diarization on the Eval/Test set.

preprint2021arXiv

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

This paper presents ByteSing, a Chinese singing voice synthesis (SVS) system based on duration allocated Tacotron-like acoustic models and WaveRNN neural vocoders. Different from the conventional SVS models, the proposed ByteSing employs Tacotron-like encoder-decoder structures as the acoustic models, in which the CBHG models and recurrent neural networks (RNNs) are explored as encoders and decoders respectively. Meanwhile an auxiliary phoneme duration prediction model is utilized to expand the input sequence, which can enhance the model controllable capacity, model stability and tempo prediction accuracy. WaveRNN neural vocoders are also adopted as neural vocoders to further improve the voice quality of synthesized songs. Both objective and subjective experimental results prove that the SVS method proposed in this paper can produce quite natural, expressive and high-fidelity songs by improving the pitch and spectrogram prediction accuracy and the models using attention mechanism can achieve best performance.

preprint2021arXiv

CatNet: music source separation system with mix-audio augmentation

Music source separation (MSS) is the task of separating a music piece into individual sources, such as vocals and accompaniment. Recently, neural network based methods have been applied to address the MSS problem, and can be categorized into spectrogram and time-domain based methods. However, there is a lack of research of using complementary information of spectrogram and time-domain inputs for MSS. In this article, we propose a CatNet framework that concatenates a UNet separation branch using spectrogram as input and a WavUNet separation branch using time-domain waveform as input for MSS. We propose an end-to-end and fully differentiable system that incorporate spectrogram calculation into CatNet. In addition, we propose a novel mix-audio data augmentation method that randomly mix audio segments from the same source as augmented audio segments for training. Our proposed CatNet MSS system achieves a state-of-the-art vocals separation source distortion ratio (SDR) of 7.54 dB, outperforming MMDenseNet of 6.57 dB evaluated on the MUSDB18 dataset.

preprint2021arXiv

Higher-order topological superconductors from Weyl semimetals

We propose that doped Weyl semimetals with four Weyl points are natural candidates to realize higher-order topological superconductors, which exhibit a fully gapped bulk while the surface hosts robust gapless chiral hinge states. We show that in such a doped Weyl semimetal, a featureless finite-range attractive interaction favors a $p+ip$ pairing symmetry. By analyzing its topological properties, we identify such a chiral pairing state as a higher-order topological superconductor, which depending on the existence of a four-fold roto-inversion symmetry $\mathsf{R}_{4z}$, is either intrinsic (meaning that the corresponding hinge states can only be removed by closing the bulk gap, rather than modifying the surface states) or extrinsic. We achieve this understanding via various methods recently developed for higher-order topology, including Wannier representability, Wannier spectrum, and defect classification approaches. For the $\mathsf{R}_{4z}$ symmetric case, we provide a complete classification of the higher-order topological superconductors. We show that such second-order topological superconductors exhibit chiral hinge modes that are robust in the absence of interaction effects but can be eliminated at the cost of introducing surface topological order.

preprint2021arXiv

Speech enhancement with weakly labelled data from AudioSet

Speech enhancement is a task to improve the intelligibility and perceptual quality of degraded speech signal. Recently, neural networks based methods have been applied to speech enhancement. However, many neural network based methods require noisy and clean speech pairs for training. We propose a speech enhancement framework that can be trained with large-scale weakly labelled AudioSet dataset. Weakly labelled data only contain audio tags of audio clips, but not the onset or offset times of speech. We first apply pretrained audio neural networks (PANNs) to detect anchor segments that contain speech or sound events in audio clips. Then, we randomly mix two detected anchor segments containing speech and sound events as a mixture, and build a conditional source separation network using PANNs predictions as soft conditions for speech enhancement. In inference, we input a noisy speech signal with the one-hot encoding of "Speech" as a condition to the trained system to predict enhanced speech. Our system achieves a PESQ of 2.28 and an SSNR of 8.75 dB on the VoiceBank-DEMAND dataset, outperforming the previous SEGAN system of 2.16 and 7.73 dB respectively.

preprint2021arXiv

SU(4) symmetry in twisted bilayer graphene - an itinerant perspective

We study symmetry-broken phases in twisted bilayer graphene at small filling above charge neutrality and at Van Hove filling. We argue that the Landau functionals for the particle-hole order parameters at these fillings both have an approximate SU(4) symmetry, but differ in the sign of quartic terms. We determine the order parameter manifold of the ground state and analyze its excitations. For small fillings, we find a strong 1st-order transition to an SU(3)$\otimes$U(1) manifold of orders that break spin-valley symmetry and induce a 3-1 splitting of fermionic excitations. For Van Hove filling, we find a weak 1st-order transition to an SO(4)$\otimes$U(1) manifold of orders that preserves the two-fold band degeneracy. We discuss the effect of particle-hole orders on superconductivity and compare with strong-coupling approaches.

preprint2021arXiv

Topological and nematic superconductivity mediated by ferro-SU(4) fluctuations in twisted bilayer graphene

We propose an SU(4) spin-valley-fermion model to investigate the superconducting instabilities of twisted bilayer graphene (TBG). In this approach, bosonic fluctuations associated with an emergent SU(4) symmetry, corresponding to combined rotations in valley and spin spaces, couple to the low-energy fermions that comprise the flat bands. These fluctuations are peaked at zero wave-vector, reflecting the "ferromagnetic-like" SU(4) ground state recently found in strong-coupling solutions of microscopic models for TBG. Focusing on electronic states related to symmetry-imposed points of the Fermi surface, dubbed here "valley hot-spots" and "van Hove hot-spots", we find that the coupling to the itinerant electrons partially lifts the huge degeneracy of the ferro-SU(4) ground state manifold, favoring inter-valley order, spin-valley coupled order, ferromagnetic order, spin-current order, and valley-polarized order, depending on details of the band structure. These fluctuations, in turn, promote attractive pairing interactions in a variety of closely competing channels, including a nodeless $f$-wave state, a nodal $i$-wave state, and topological $d+id$ and $p+ip$ states with unusual Chern numbers $2$ and $4$, respectively. Nematic superconductivity, although not realized as a primary instability of the system, still appears as a consequence of the near-degeneracy of superconducting order parameters that transform as one-dimensional and two-dimensional irreducible representations of the point group $D_{6}$.

preprint2020arXiv

A hybrid text normalization system using multi-head self-attention for mandarin

In this paper, we propose a hybrid text normalization system using multi-head self-attention. The system combines the advantages of a rule-based model and a neural model for text preprocessing tasks. Previous studies in Mandarin text normalization usually use a set of hand-written rules, which are hard to improve on general cases. The idea of our proposed system is motivated by the neural models from recent studies and has a better performance on our internal news corpus. This paper also includes different attempts to deal with imbalanced pattern distribution of the dataset. Overall, the performance of the system is improved by over 1.5% on sentence-level and it has a potential to improve further.

preprint2020arXiv

Chiral Dirac Superconductors: Second-order and Boundary-obstructed Topology

We analyze the topological properties of a chiral ${p}+i{p}$ superconductor for a two-dimensional metal/semimetal with four Dirac points. Such a system has been proposed to realize second-order topological superconductivity and host corner Majorana modes. We show that with an additional $\mathsf{C}_4$ rotational symmetry, the system is in an intrinsic higher-order topological superconductor phase, and with a lower and more natural $\mathsf{C}_2$ symmetry, is in a boundary-obstructed topological superconductor phase. The boundary topological obstruction is protected by a bulk Wannier gap. However, we show that the well-known nested-Wilson loop is in general unquantized despite the particle-hole symmetry, and thus fails as a topological invariant. Instead, we show that the higher-order topology and boundary-obstructed topology can be characterized using an alternative defect classification approach, in which the corners of a finite sample is treated as a defect of a space-filling Hamiltonian. We establish "Dirac+$({p}+i{p})$" as a sufficient condition for second-order topological superconductivity.

preprint2020arXiv

Convolutional Embedding for Edit Distance

Edit-distance-based string similarity search has many applications such as spell correction, data de-duplication, and sequence alignment. However, computing edit distance is known to have high complexity, which makes string similarity search challenging for large datasets. In this paper, we propose a deep learning pipeline (called CNN-ED) that embeds edit distance into Euclidean distance for fast approximate similarity search. A convolutional neural network (CNN) is used to generate fixed-length vector embeddings for a dataset of strings and the loss function is a combination of the triplet loss and the approximation error. To justify our choice of using CNN instead of other structures (e.g., RNN) as the model, theoretical analysis is conducted to show that some basic operations in our CNN model preserve edit distance. Experimental results show that CNN-ED outperforms data-independent CGK embedding and RNN-based GRU embedding in terms of both accuracy and efficiency by a large margin. We also show that string similarity search can be significantly accelerated using CNN-based embeddings, sometimes by orders of magnitude.

preprint2020arXiv

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech

Accent conversion (AC) transforms a non-native speaker's accent into a native accent while maintaining the speaker's voice timbre. In this paper, we propose approaches to improving accent conversion applicability, as well as quality. First of all, we assume no reference speech is available at the conversion stage, and hence we employ an end-to-end text-to-speech system that is trained on native speech to generate native reference speech. To improve the quality and accent of the converted speech, we introduce reference encoders which make us capable of utilizing multi-source information. This is motivated by acoustic features extracted from native reference and linguistic information, which are complementary to conventional phonetic posteriorgrams (PPGs), so they can be concatenated as features to improve a baseline system based only on PPGs. Moreover, we optimize model architecture using GMM-based attention instead of windowed attention to elevate synthesized performance. Experimental results indicate when the proposed techniques are applied the integrated system significantly raises the scores of acoustic quality (30$\%$ relative increase in mean opinion score) and native accent (68$\%$ relative preference) while retaining the voice identity of the non-native speaker.

preprint2020arXiv

Interplay between superconductivity and non-Fermi liquid at a quantum critical point in a metal. II. The $γ$-model at a finite $T$ for $0<γ<1$

In this paper we continue the analysis of the interplay between non-Fermi liquid and superconductivity for quantum-critical systems, the low-energy physics of which is described by an effective model with dynamical electron-electron interaction $V(Ω_m) \propto 1/|Ω_m|^γ$ (the $γ$ model). In paper I [A. Abanov and A. V. Chubukov, Phys Rev B. 102, 024524 (2020)] two of us analyzed the $γ$ model at $T=0$ for $0<γ<1$ and argued that there exist a discrete, infinite set of topologically distinct solutions for the superconducting gap, all with the same spatial symmetry. The gap function $Δ_n (ω_m)$ for the $n$th solution changes sign $n$ times as the function of Matsubara frequency. In this paper we analyze the linearized gap equation at a finite $T$. We show that there exist an infinite set of pairing instability temperatures, $T_{p,n}$, and the eigenfunction $Δ_n (ω_{m})$ changes sign $n$ times as a function of a Matsubara number $m$. We argue that $Δ_n (ω_{m})$ retains its functional form below $T_{p,n}$ and at $T=0$ coincides with the $n$th solution of the nonlinear gap equation. Like in paper I, we extend the model to the case when the interaction in the pairing channel has an additional factor $1/N$ compared to that in the particle-hole channel. We show that $T_{p,0}$ remains finite at large $N$ due to special properties of fermions with Matsubara frequencies $\pm πT$, but all other $T_{p,n}$ terminate at $N_{cr} = O(1)$. The gap function vanishes at $T \to 0$ for $N > N_{cr}$ and remains finite for $N < N_{cr}$. This is consistent with the $T =0$ analysis.

preprint2020arXiv

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Audio pattern recognition is an important research topic in the machine learning area, and includes several tasks such as audio tagging, acoustic scene classification, music classification, speech emotion classification and sound event detection. Recently, neural networks have been applied to tackle audio pattern recognition problems. However, previous systems are built on specific datasets with limited durations. Recently, in computer vision and natural language processing, systems pretrained on large-scale datasets have generalized well to several tasks. However, there is limited research on pretraining systems on large-scale datasets for audio pattern recognition. In this paper, we propose pretrained audio neural networks (PANNs) trained on the large-scale AudioSet dataset. These PANNs are transferred to other audio related tasks. We investigate the performance and computational complexity of PANNs modeled by a variety of convolutional neural networks. We propose an architecture called Wavegram-Logmel-CNN using both log-mel spectrogram and waveform as input feature. Our best PANN system achieves a state-of-the-art mean average precision (mAP) of 0.439 on AudioSet tagging, outperforming the best previous system of 0.392. We transfer PANNs to six audio pattern recognition tasks, and demonstrate state-of-the-art performance in several of those tasks. We have released the source code and pretrained models of PANNs: https://github.com/qiuqiangkong/audioset_tagging_cnn.

preprint2020arXiv

Quantum Phase Transition in the Yukawa-SYK Model

We study the quantum phase transition upon variation of the fermionic density $ν$ in a solvable model with random Yukawa interactions between $N$ bosons and $M$ fermions, dubbed the Yukawa-SYK model. We show that there are two distinct phases in the model: an incompressible state with gapped excitations and an exotic quantum-critical, non-Fermi liquid state with exponents varying with $ν$. We show analytically and numerically that the quantum phase transition between these two states is first-order, as for some range of $ν$ the NFL state has a negative compressibility. In the limit $N/M\to \infty$ the first-order transition gets weaker and asymptotically becomes second-order, with an exotic quantum-critical behavior. We show that fermions and bosons display highly unconventional spectral behavior in the transition region.

preprint2020arXiv

Review of Text Style Transfer Based on Deep Learning

Text style transfer is a hot issue in recent natural language processing,which mainly studies the text to adapt to different specific situations, audiences and purposes by making some changes. The style of the text usually includes many aspects such as morphology, grammar, emotion, complexity, fluency, tense, tone and so on. In the traditional text style transfer model, the text style is generally relied on by experts knowledge and hand-designed rules, but with the application of deep learning in the field of natural language processing, the text style transfer method based on deep learning Started to be heavily researched. In recent years, text style transfer is becoming a hot issue in natural language processing research. This article summarizes the research on the text style transfer model based on deep learning in recent years, and summarizes, analyzes and compares the main research directions and progress. In addition, the article also introduces public data sets and evaluation indicators commonly used for text style transfer. Finally, the existing characteristics of the text style transfer model are summarized, and the future development trend of the text style transfer model based on deep learning is analyzed and forecasted.

preprint2020arXiv

Solvable Strong-coupling Quantum Dot Model with a Non-Fermi-liquid Pairing Transition

We show that a random interacting model exhibits solvable non-Fermi liquid behavior and exotic pairing behavior. This model, dubbed as the Yukawa-SYK model, describes the random Yukawa coupling between $M$ quantum dots each hosting $N$ flavors of fermions and $N^2$ bosons that self-tunes to criticality at low energies. The diagrammatic expansion is controlled by $1/MN$, and the results become exact in a large-$M$, large-$N$ limit. We find that pairing only develops within a region of the $(M,N)$ plane --- even though the pairing interaction is strongly attractive, the incoherence of the fermions can spoil the forming of Cooper pairs, rendering the system a non-Fermi liquid down to zero temperature. By solving the Eliashberg equation and the renormalization group equation, we show that the transition into the pairing phase exhibits Kosterlitz-Thouless quantum-critical behavior.

preprint2020arXiv

Source separation with weakly labelled data: An approach to computational auditory scene analysis

Source separation is the task to separate an audio recording into individual sound sources. Source separation is fundamental for computational auditory scene analysis. Previous work on source separation has focused on separating particular sound classes such as speech and music. Many of previous work require mixture and clean source pairs for training. In this work, we propose a source separation framework trained with weakly labelled data. Weakly labelled data only contains the tags of an audio clip, without the occurrence time of sound events. We first train a sound event detection system with AudioSet. The trained sound event detection system is used to detect segments that are mostly like to contain a target sound event. Then a regression is learnt from a mixture of two randomly selected segments to a target segment conditioned on the audio tagging prediction of the target segment. Our proposed system can separate 527 kinds of sound classes from AudioSet within a single system. A U-Net is adopted for the separation system and achieves an average SDR of 5.67 dB over 527 sound classes in AudioSet.

preprint2020arXiv

Xiaomingbot: A Multilingual Robot News Reporter

This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four integral capabilities: news generation, news translation, news reading and avatar animation. Its system summarizes Chinese news that it automatically generates from data tables. Next, it translates the summary or the full article into multiple languages, and reads the multilingual rendition through synthesized speech. Notably, Xiaomingbot utilizes a voice cloning technology to synthesize the speech trained from a real person's voice data in one input language. The proposed system enjoys several merits: it has an animated avatar, and is able to generate and read multilingual news. Since it was put into practice, Xiaomingbot has written over 600,000 articles, and gained over 150,000 followers on social media platforms.

preprint2019arXiv

The interplay between superconductivity and non-Fermi liquid at a quantum-critical point in a metal

Near a quantum-critical point, a metal reveals two competing tendencies: destruction of fermionic coherence and attraction in one or more pairing channels. We analyze the competition within Eliashberg theory for a class of quantum-critical models with an effective dynamical electron-electron interaction $V(Ω_m) \propto 1/|Ω_m|^γ$ (the $γ$-model) for $0 < γ<1$. We argue that the two tendencies are comparable in strength, yet the one towards pairing is stronger, and the ground state is a superconductor. We show, however, that there exist two distinct regimes of system behavior below the onset temperature of the pairing $T_p$. In the range $T_{cr} < T < T_p$ fermions remain incoherent and the density of states $N(ω)$ displays "gap filling" behavior in which the position of the maximum in $N(ω)$ is set by temperature rather than the pairing gap. At lower $T < T_{cr}$, fermions acquire coherence, and $N(ω)$ displays a conventional "gap closing" behavior. We argue that the existence of the two regimes comes about because of special behavior of fermions with frequencies $ω= \pm πT$ along the Matsubara axis. Specifically, for these fermions, the component of the self-energy, which competes with the pairing, vanishes in the normal state. We further argue that the crossover at $T \sim T_{cr}$ comes about because Eliashberg equations allow an infinite number of topologically distinct solutions for the onset temperature of the pairing within the same gap symmetry. Only one solution, with the highest $T_p$, actually emerges, but other solutions are generated and modify the form of the gap function at $T \leq T_{cr}$. Finally, we argue that the actual $T_c$ is comparable to $T_{cr}$, while at $T_{cr} < T < T_{p}$ phase fluctuations destroy superconducting long-range order, and the system displays pseudogap behavior.

preprint2019arXiv

The Physics of Pair Density Waves

We review the physics of pair density wave (PDW) superconductors. We begin with a macroscopic description that emphasizes order induced by PDW states, such as charge density wave, and discuss related vestigial states that emerge as a consequence of partial meting of the PDW order. We review and critically discuss the mounting experimental evidence for such PDW order in the cuprate superconductors, the status of the theoretical microscopic description of such order, and the current debate on whether the PDW is a "mother order" or another competing order in the cuprates. In addition, we give an overview of the weak coupling version of PDW order, Fulde-Ferrell-Larkin-Ovchinnikov states, in the context of cold atom systems, unconventional superconductors, and non-centrosymmetric and Weyl materials.

preprint2016arXiv

Interplay between uni-directional and bi-directional charge-density-wave orders in underdoped cuprates

We analyze the interplay between charge-density-wave (CDW) orders with axial momenta $(Q, 0)$ and $(0,Q)$ ($Δ_x$ and $Δ_y$ respectively), detected in the underdoped cuprates. The CDW order in real space can be uni-directional (either $Δ_x$ or $Δ_y$ is non-zero) or bi-directional (both $Δ_x$ and $Δ_y$ are non-zero). To understand which of the two orders develop, we adopt the magnetic scenario, in which the CDW order appears due to spin-fluctuation exchange, and derive the Ginzburg-Landau action to the sixth order in $Δ_x$ and $Δ_y$. We argue that, at the mean-field level, the CDW order is bi-directional at the onset, with equal amplitudes of $Δ_x$ and $Δ_y$, but changes to uni-directional inside the CDW phase. This implies that, at a given temperature, CDW order is uni-directional at smaller dopings, but becomes bi-directional at larger dopings. This is consistent with recent x-ray data on YBCO, which detected tendency towards bi-directional order at larger dopings. We discuss the role of discrete symmetry breaking at a higher temperature for the interplay between bi-directional and uni-directional CDW orders and also discuss the role of pair-density-wave (PDW) order, which may appear along with CDW. We argue that PDW with the same momentum as CDW changes the structure of the bi-directional charge order by completely replacing either $Δ_x$ or $Δ_y$ CDW components by PDW. However, if an "Amperean" PDW order, which pairs fermions with approximately the same momenta, is also present, both $Δ_x$ and $Δ_y$ remain non-zero in the bi-directional phase, albeit with non-equal amplitudes. This is again consistent with x-ray experiments, which at larger doping found non-equal $Δ_x$ and $Δ_y$ in every domain.

preprint2016arXiv

Superconductivity near a quantum-critical point: The special role of the first Matsubara frequency

Near a quantum-critical point in a metal strong fermion-fermion interaction mediated by a soft collective boson gives rise to incoherent, non-Fermi liquid behavior. It also often gives rise to superconductivity which masks the non-Fermi liquid behavior. We analyze the interplay between superconductivity and fermionic incoherence for a set of quantum-critical models with effective dynamical interaction between low-energy fermions. We argue that superconductivity always develops above a quantum-critical point, no matter how strong the fermionic self-energy is. We argue that superconductivity should not be viewed as a pairing of incoherent fermions, as previously thought. Rather, $T_c$ is non-zero due to the fact that the self-energy is suppressed at the two lowest fermionic Matsubara frequencies $ω_m = \pm πT$. We obtain the analytic formula for $T_c$ which reproduces well numerical results for the electron-phonon model at vanishing Debye frequency.

preprint2016arXiv

Topological density-wave states in a particle-hole symmetric Weyl metal

We study the instabilities of a particle-hole symmetric Weyl metal with both electron and hole Fermi surfaces (FS) around the Weyl points. For a repulsive interaction, we find that the leading instability is towards a longitudinal spin-density-wave order (SDW$_z$). Besides, there exist three degenerate subleading instabilities: a charge-density-wave (CDW) instability and two transverse spin-density-wave (SDW$_{x,y}$) instabilities. For an attractive interaction the leading instabilities are towards two pair-density-wave orders (PDW) which pair the two FS's separately. Both the PDW and SDW$_z$ order parameters fully gap out FS's, while the CDW and SDW$_{x,y}$ ones leave line nodes on both FS's. For the SDW$_z$ and the PDW states, the surface Fermi arc in the metallic state evolves to a chiral Fermi line which passes the projection of the Weyl points and traverses the full momentum space. For the CDW state, the line node projects to a "drumhead" band localized on the surface, which can lead to a topological charge polarization. We verify the surface states by numerically simulating the angular-resolved photoemission spectroscopy data.

preprint2016arXiv

Topological superconducting phases from inversion symmetry breaking order in spin-orbit-coupled systems

We analyze the superconducting instabilities in the vicinity of the quantum-critical point of an inversion symmetry breaking order. We first show that the fluctuations of the inversion symmetry breaking order lead to two degenerate superconducting (SC) instabilities, one in the $s$-wave channel, and the other in a time-reversal invariant odd-parity pairing channel (the simplest case being the same as the of $^3$He-B phase). Remarkably, we find that unlike many well-known examples, the selection of the pairing symmetry of the condensate is independent of the momentum-space structure of the collective mode that mediates the pairing interaction. We found that this degeneracy is a result of the existence of a conserved fermionic helicity, $χ$, and the two degenerate channels correspond to even and odd combinations of SC order parameters with $χ=\pm1$. As a result, the system has an enlarged symmetry $U(1)\times U(1)$, with each $U(1)\times U(1)$ corresponding to one value of the helicity $χ$. Because of the enlarged symmetry, this system admits exotic topological defects such as a fractional quantum vortex, which we show has a Majorana zero mode bound at its core. We discuss how the enlarged symmetry can be lifted by small perturbations, such as the Coulomb interaction or Fermi surface splitting in the presence of broken inversion symmetry, and we show that the resulting superconducting state can be topological or trivial depending on parameters. The $U(1)\times U(1)$ symmetry is restored at the phase boundary between the topological and trivial SC states, and allows for a transition between topologically distinct SC phases without the vanishing of the order parameter. We present a global phase diagram of the superconducting states and discuss possible experimental implications.

preprint2016arXiv

Trainable Frontend For Robust and Far-Field Keyword Spotting

Robust and far-field speech recognition is critical to enable true hands-free communication. In far-field conditions, signals are attenuated due to distance. To improve robustness to loudness variation, we introduce a novel frontend called per-channel energy normalization (PCEN). The key ingredient of PCEN is the use of an automatic gain control based dynamic compression to replace the widely used static (such as log or root) compression. We evaluate PCEN on the keyword spotting task. On our large rerecorded noisy and far-field eval sets, we show that PCEN significantly improves recognition performance. Furthermore, we model PCEN as neural network layers and optimize high-dimensional PCEN parameters jointly with the keyword spotting acoustic model. The trained PCEN frontend demonstrates significant further improvements without increasing model complexity or inference-time cost.

preprint2015arXiv

Coexistence of charge-density-wave and pair-density-wave orders in underdoped cuprates

We analyze incommensurate charge-density-wave (CDW) and pair-density-wave (PDW) orders with transferred momenta $(\pm Q,0)$/$(0,\pm Q)$ in underdoped cuprates within the spin-fermion model. Both orders appear due to exchange of spin fluctuations before magnetic order develops.We argue that the ordered state with the lowest energy has non-zero CDW and PDW components with the same momentum. Such a state breaks $C_4$ lattice rotational symmetry, time-reversal symmetry, and mirror symmetries. We argue that the feedback from CDW/PDW order on fermionic dispersion is consistent with ARPES data. We discuss the interplay between the CDW/PDW order and $d_{x^2-y^2}$ superconductivity and make specific predictions for experiments.

preprint2015arXiv

Enhancement of superconductivity at the onset of charge-density-wave order in a metal

We analyze superconductivity in the cuprates near the onset of an incommensurate charge density wave (CDW) order with momentum ${\bf Q} = (Q,0)/(0,Q)$, as observed in the experiments. We first consider a semi-phenomenological charge-fermion model in which hot fermions, separated by ${\bf Q}$, attract each other by exchanging soft CDW fluctuations. We find that in a quantum-critical region near CDW transition, $T_c = A {\bar g}_c$, where ${\bar g}_c$ is charge-fermion coupling and $A$ is the prefactor which we explicitly compute. We then consider the particular microscopic scenario in which CDW order parameter emerges as a composite field made out of primary spin-density-wave fields. We show that charge-fermion coupling ${\bar g}_c$ is of order of spin-fermion coupling ${\bar g}_s$. As the consequence, superconducting $T_c$ is substantially enhanced near the onset of CDW order. Finally we analyze the effect of an external magnetic field $H$. We show that, as $H$ increases, optimal $T_c$ decreases and the superconducting dome becomes progressively more confined to the CDW quantum-critical point. These results are consistent with the experiments.

preprint2015arXiv

Fluctuating charge order in the cuprates: spatial anisotropy and feedback from superconductivity

We analyze the form of static charge susceptibility $χ(q)$ in underdoped cuprates near axial momenta $(Q,0)$ and $(0,Q)$ at which short-range static charge order has been observed. We show that the momentum dependence of $χ(q)$ is anisotropic, and the correlation length in the longitudinal direction is larger than in the transverse direction. We show that correlation lengths in both directions decrease once the system evolves into a superconductor, as a result of the competition between superconductivity and charge order. These results are in agreement with resonant x-ray scattering data [R. Comin et al., Science 347, 1335 (2015)]. We also argue that density and current components of the charge order parameter are affected differently by superconductivity --- the charge-density component is reduced less than the current component and hence extends deeper into the superconducting state. This gives rise to two distinct charge order transitions at zero temperature.

preprint2015arXiv

Interplay between pair-density-wave and charge-density-wave orders in underdoped cuprates

We analyze the interplay between charge-density-wave (CDW) and pair-density-wave (PDW) orders within the spin-fermion model for the cuprates. We specifically consider CDW order with transferred momenta $(\pm Q,0)$/$(0,\pm Q)$, and PDW order with total momenta $(0,\pm Q)/(\pm Q,0)$. We show that both emerge in the spin-fermion model near the onset of antiferromagnetism. We further argue that the two orders are nearly degenerate due to an approximate SU(2) particle-hole symmetry of the model. The ${\rm SU}(2)$ symmetry becomes exact if one neglects the curvature of the Fermi surface in hot regions, in which case ${\rm U}(1)$ CDW and PDW order parameters become components of an SO(4)-symmetric PDW/CDW "super-vector". We develop a Ginzburg-Landau theory for PDW/CDW order parameters and find two possible ground states: a "stripe" state, and a "checkerboard" state. We show that the ${\rm SO}(4)$ symmetry between CDW and PDW is broken by two effects. One is the inclusion of Fermi surface curvature, which selects a PDW order immediately below the instability temperature. Another is the overlap between different hot regions, which favors CDW order at low temperatures. For the stripe state, we show that the competition between the two effects gives rise to a first-order transition from PDW to CDW inside the ordered state. We also argue that beyond mean-field theory, the onset temperature for CDW order is additionally enhanced due to feedback from a preemptive breaking of ${\mathbb Z}_2$ time-reversal symmetry. We discuss the ground state properties of a pure PDW state and a pure CDW state, and show that the PDW checkerboard state yields a vortex-anti-vortex lattice. For the checkerboard state, we considered a situation when both CDW and PDW orders are present at low $T$ and show that the presence of both condensates induces a long sought chiral $s+id_{xy}$ superconductivity.

preprint2015arXiv

Reply to "A strong coupling critique of spin fluctuation driven charge order in underdoped cuprates"

We reply to the criticism from the authors of arXiv:1502.02782 of the spin-fluctuation scenario for charge order in the cuprates. The authors of of arXiv:1502.02782 argued that spin-fluctuation exchange cannot give rise to charge order with observed momentum $(Q, 0)/(0, Q)$ due to the absence of nesting for a half of fermions involved. We explicitly show the instability towards charge order exists even in the "worst-case" scenario of anti-nesting for a half of fermions.

preprint2015arXiv

Superconducting and charge-density-wave orders in the spin-fermion model: a comparative analysis

We present comparative analysis of superconducting and charge-density-wave orders in the spin-fluctuation scenario for the cuprates. That spin-fluctuation exchange gives rise to d-wave superconductivity is well known. Several groups recently argued that the same spin-mediated interaction may also account for charge-density-wave order with momenta $(Q,0)$ or $(0,Q)$, detected in underdoped cuprates. This has been questioned on the basis that charge-density-wave channel mixes fermions from both nested and anti-nested regions on the Fermi surface, and fermions in the anti-nested region do not have a natural tendency to form a bound state, even if the interaction is attractive. We show that anti-nesting is not an obstacle for charge order, but to see this one needs to go beyond the conventional Eliashberg approximation. We show that in the prefect nesting/antinesting case, when the velocities of hot fermions are either parallel or antiparallel, the onset temperatures in superconducting and charge-density-wave channels are of comparable strength for any magnetic correlation length $ξ$. The superconducting $T_{\rm sc}$ is larger than $T_{\rm cdw}$, but only numerically. When the velocities of hot fermions are not strictly parallel/antiparallel, $T_{\rm cdw}$ progressively decreases as $ξ$ decreases and vanishes at some critical $ξ$.

preprint2014arXiv

Charge-density-wave order with momentum $(2Q, 0)$ and $(0, 2Q)$ within the spin-fermion model: continuous and discrete symmetry breaking, preemptive composite order, and relation to pseudogap in hole-doped cuprates

We analyze charge order within the the spin-fermion model. We show that magnetically-mediated interaction gives rise to charge order $Δ_k^Q = \langle c^\dagger_{{\bf k}+{\bf Q}} c_{{\bf k}-{\bf Q}}\rangle$ with momenta ${\bf Q}=Q_x =(2Q,0)$ and ${\bf Q}=Q_y =(0,2Q)$, if the magnetic correlation length $ξ$ exceeds some critical value. We argue that $Δ_k^Q$ and $Δ_{-k}^Q$ are not equivalent, and their symmetric and antisymmetric combinations describe density modulations and bond current. We derive GL functional for four-component $U(1)$ order parameters $Δ^Q_{\pm k}$ with ${\bf Q} = Q_x$ or $Q_y$. Within mean-field we find two types of CDW states, I and II, depending on system parameters. In state I density and current modulations emerge with the same ${\bf Q} = Q_x$ or $Q_y$, breaking $Z_2$ lattice rotational symmetry, and differ in phase by $\pmπ/2$. The selection of $π/2$ or $-π/2$ additionally breaks $Z_2$ time-reversal symmetry, such that the total order parameter manifold is $U(1) \times Z_2 \times Z_2$. In state II density and current modulations emerge with different $\bf Q$ and the order parameter manifold is $U(1) \times U (1) \times Z_2$. We go beyond mean-field and show that discrete symmetries get broken before long-range charge order sets in. For state I, the system first breaks $Z_2$ lattice rotational symmetry ($C_4 \to C_2$) at $T= T_n$ and develops a nematic order, then breaks $Z_2$ time-reversal symmetry at $T_t < T_n$, and finally breaks $U(1)$ symmetry of a common phase of even and odd components of $Δ^Q_{k}$ at $T= T_{\rm cdw} < T_t < T_n$ and develops a true charge order. We argue that the preemptive orders lift $T_{\rm cdw}$ and reduces $T_{\rm sc}$ such that at large $ξ$ charge order may develop prior to superconductivity. We obtain the phase diagram and present quantitative comparison with ARPES data for hole-doped cuprates.

preprint2014arXiv

Polar Kerr effect from chiral-nematic charge order

We analyze the polar Kerr effect in an itinerant electron system on a square lattice in the presence of a composite charge order proposed for the pseudogap state in underdoped cuprates. This composite charge order preserves discrete translational symmetries, and is "chiral-nematic" in the sense that it breaks time-reversal symmetry, mirror symmetries in $x$ and $y$ directions, and $C_4$ lattice rotation symmetry. The Kerr angle $θ_K$ in $C_4$-symmetric system is proportional to the antisymmetric component of the anomalous Hall conductivity $σ_{xy}-σ_{yx}$. We show that this result holds when $C_4$ symmetry is broken. We show that in order for $σ_{xy}$ and $σ_{yx}$ to be non-zero the mirror symmetries in $x$ and $y$ directions have to be broken, and that for $σ_{xy}-σ_{yx}$ to be non-zero time-reversal symmetry has to be broken. The chiral-nematic charge order satisfies all these conditions, such that a non-zero signal in a polar Kerr effect experiment is symmetry allowed. We further show that to get a non-zero $θ_K$ in a one-band spin-fluctuation scenario, in the absence of disorder, one has to extend the spin-mediated interaction to momenta away from $(π,π)$ and has to include particle-hole asymmetry. Alternatively, in the presence of disorder one can get a non-zero $θ_K$ from impurity scattering: either due to skew scattering (with non-Gaussian disorder) or due to particle-hole asymmetry in case of Gaussian disorder. The impurity analysis in our case is similar to that in earlier works on Kerr effect in $p_x+ip_y$ superconductor, however in our case the magnitude of $θ_K$ is enhanced by the flattening of the Fermi surface in the "hot" regions which mostly contribute to charge order.

preprint2013arXiv

Quantum-critical pairing in electron-doped cuprates

We revisit the problem of spin-mediated superconducting pairing at the antiferromagnetic quantum-critical point with the ordering momentum (π,π) = 2k_F. The problem has been previously considered by one of the authors. However, it was later pointed out that that analysis neglected umklapp processes for the spin polarization operator. We incorporate umklapp terms and re-evaluate the normal state self-energy and the critical temperature of the pairing instability. We show that the self-energy has a Fermi-liquid form and obtain the renormalization of the quasiparticle residue Z, the Fermi velocity, and the curvature of the Fermi surface. We argue that the pairing is a BCS-type problem, but go one step beyond the BCS theory and calculate the critical temperature T_c with the prefactor. We apply the results to electron-doped cuprates near optimal doping and obtain T_c \geq 10 K, which matches the experimental results quite well.

preprint2013arXiv

Superconductivity at the onset of spin-density-wave order in a metal

We revisit the issue of superconductivity at the quantum-critical point (QCP) between a 2D paramagnet and a spin-density-wave metal with ordering momentum (π,π). This problem is highly non-trivial because the system at criticality displays a non-Fermi liquid behavior and because the effective coupling constant λfor the pairing is generally of order one, even when the actual interaction is smaller than fermionic bandwidth. Previous study [M. A. Metlitski, S. Sachdev, Phys.Rev.B 82, 075128 (2010)] has found that the renormalizations of the pairing vertex are stronger than in BCS theory and hold in powers of \log^2 (1/T), like in color superconductivity. We analyze the full gap equation and argue that, for QCP problem, summing up of the leading logarithms does not lead to a pairing instability. Yet, we show that superconductivity has no threshold and appears even if the coupling is set to be small, because subleading logarithmical renormalizations diverge and give rise to BCS-like \log(1/T_c) \propto 1/λ. We argue that the analogy with BCS is not accidental as at small coupling superconductivity at a QCP predominantly comes from fermions which retain Fermi liquid behavior at criticality. We computed T_c for the actual λ\sim 1, and found that both Fermi-liquid and non-Fermi liquid fermions contribute to the pairing. The value of T_c agrees well with the numerical results.

preprint2012arXiv

Residual Energy in Weak and Strong MHD Turbulence

Recent numerical and observational studies revealed that spectra of magnetic and velocity fluctuations in MHD turbulence have different scaling indexes. This intriguing feature has been recently explained in the case of weak MHD turbulence, that is, turbulence consisting of weakly interacting Alfven waves. However, astrophysical turbulence is strong in majority of cases. In the present work, we propose a unifying picture that allows one to address weak and strong MHD turbulence on the same footing. We argue that magnetic and kinetic energies are different in both weak and strong MHD turbulence. Their difference, the so-called residual energy, is spontaneously generated by turbulence, it has the Fourier spectrum E_r(k)=E_v(k)-E_b(k) \propto -f_w(k_||/k_perp) k_perp^{-2} in weak turbulence, and E_r(k) \propto -f_s(k_||/k_perp) k_perp^{-3} in strong turbulence. Here f_w,s(x) are functions declining fast for x>C_w,s and not significantly varying for $x<C_w,s$ with some constants C_w,s, and k_|| and k_perp the field-parallel and field-perpendicular wave vectors with respect to the applied strong uniform magnetic field.

preprint2011arXiv

Residual Energy in Magnetohydrodynamic Turbulence

There is mounting evidence in solar wind observations and in numerical simulations that kinetic and magnetic energies are not in equipartition in magnetohydrodynamic turbulence. The origin of their mismatch, the residual energy E_r=E_v-E_b, is not well understood. In the present work this effect is studied analytically in the regime of weak magnetohydrodynamic turbulence. We find that residual energy is spontaneously generated by turbulent dynamics, and it has a negative sign, in good agreement with the observations. We obtain that the residual energy condenses around k_||=0 with its k_||-spectrum broadening linearly with k_perp, where k_|| and k_perp are the wavenumbers parallel and perpendicular to the background magnetic field, and the field-perpendicular spectrum of the residual energy has the scaling E_r(k_perp)\propto k_perp^{-1} in the inertial interval. These results are found to be in agreement with numerical simulations. We propose that residual energy plays a fundamental role in Alfvenic turbulence and it should be taken into account for correct interpretation of observational and numerical data.

preprint2007arXiv

A Novel Model of Working Set Selection for SMO Decomposition Methods

In the process of training Support Vector Machines (SVMs) by decomposition methods, working set selection is an important technique, and some exciting schemes were employed into this field. To improve working set selection, we propose a new model for working set selection in sequential minimal optimization (SMO) decomposition methods. In this model, it selects B as working set without reselection. Some properties are given by simple proof, and experiments demonstrate that the proposed method is in general faster than existing methods.

Yuxuan Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

61 published item(s)

Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots

Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents

Byzantine-Robust Distributed Sparse Learning Revisited

How vehicles change lanes after encountering crashes: Empirical analysis and modeling

The AI Hippocampus: How Far are We From Human Memory?

Variable Basis Mapping for Real-Time Volumetric Visualization

BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts

Vestigial $d$-wave charge-$4e$ Superconductivity from Bidirectional Pair Density Waves

Many-body higher-order topological invariant for $C_n$-symmetric insulators

STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering

A Piecewise Monotonic Gait Phase Estimation Model for Controlling a Powered Transfemoral Prosthesis in Various Locomotion Modes

An Efficient Algorithm for the Partitioning Min-Max Weighted Matching Problem

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech

GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Mixed QCD-EW corrections for Higgs leptonic decay via $HW^+W^-$ vertex

Monte Carlo study of the pseudogap and superconductivity emerging from quantum magnetic fluctuations

Network-Level Adversaries in Federated Learning

NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism

Neural Dubber: Dubbing for Videos According to Scripts

SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

The dynamical exponent of a quantum critical itinerant ferromagnet: a Monte Carlo study

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

CatNet: music source separation system with mix-audio augmentation

Higher-order topological superconductors from Weyl semimetals

Speech enhancement with weakly labelled data from AudioSet

SU(4) symmetry in twisted bilayer graphene - an itinerant perspective

Topological and nematic superconductivity mediated by ferro-SU(4) fluctuations in twisted bilayer graphene

A hybrid text normalization system using multi-head self-attention for mandarin

Chiral Dirac Superconductors: Second-order and Boundary-obstructed Topology

Convolutional Embedding for Edit Distance

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech

Interplay between superconductivity and non-Fermi liquid at a quantum critical point in a metal. II. The $γ$-model at a finite $T$ for $0<γ<1$

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Quantum Phase Transition in the Yukawa-SYK Model

Review of Text Style Transfer Based on Deep Learning

Solvable Strong-coupling Quantum Dot Model with a Non-Fermi-liquid Pairing Transition

Source separation with weakly labelled data: An approach to computational auditory scene analysis

Xiaomingbot: A Multilingual Robot News Reporter

The interplay between superconductivity and non-Fermi liquid at a quantum-critical point in a metal

The Physics of Pair Density Waves

Interplay between uni-directional and bi-directional charge-density-wave orders in underdoped cuprates

Superconductivity near a quantum-critical point: The special role of the first Matsubara frequency

Topological density-wave states in a particle-hole symmetric Weyl metal

Topological superconducting phases from inversion symmetry breaking order in spin-orbit-coupled systems

Trainable Frontend For Robust and Far-Field Keyword Spotting

Coexistence of charge-density-wave and pair-density-wave orders in underdoped cuprates

Enhancement of superconductivity at the onset of charge-density-wave order in a metal

Fluctuating charge order in the cuprates: spatial anisotropy and feedback from superconductivity

Interplay between pair-density-wave and charge-density-wave orders in underdoped cuprates

Reply to "A strong coupling critique of spin fluctuation driven charge order in underdoped cuprates"

Superconducting and charge-density-wave orders in the spin-fermion model: a comparative analysis

Charge-density-wave order with momentum $(2Q, 0)$ and $(0, 2Q)$ within the spin-fermion model: continuous and discrete symmetry breaking, preemptive composite order, and relation to pseudogap in hole-doped cuprates

Polar Kerr effect from chiral-nematic charge order

Quantum-critical pairing in electron-doped cuprates

Superconductivity at the onset of spin-density-wave order in a metal

Residual Energy in Weak and Strong MHD Turbulence

Residual Energy in Magnetohydrodynamic Turbulence

A Novel Model of Working Set Selection for SMO Decomposition Methods