Source author record

YuJun Wang

YuJun Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.atom-ph Sound eess.AS cond-mat.quant-gas quant-ph Artificial Intelligence Computation and Language Machine Learning nucl-th

Catalog footprint

What is connected

23works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Detect what you want: Target Sound Detection

Human beings can perceive a target sound type from a multi-source mixture signal by the selective auditory attention, however, such functionality was hardly ever explored in machine hearing. This paper addresses the target sound detection (TSD) task, which aims to detect the target sound signal from a mixture audio when a target sound's reference audio is given. We present a novel target sound detection network (TSDNet) which consists of two main parts: A conditional network which aims at generating a sound-discriminative conditional embedding vector representing the target sound, and a detection network which takes both the mixture audio and the conditional embedding vector as inputs and produces the detection result of the target sound. These two networks can be jointly optimized with a multi-task learning approach to further improve the performance. In addition, we study both strong-supervised and weakly-supervised strategies to train TSDNet and propose a data augmentation method by mixing two samples. To facilitate this research, we build a target sound detection dataset (\textit{i.e.} URBAN-TSD) based on URBAN-SED and UrbanSound8K datasets, and experimental results indicate our method could get the segment-based F scores of 76.3$\%$ and 56.8$\%$ on the strongly-labelled and weakly-labelled data respectively.

preprint2022arXiv

Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation

Learning emotion embedding from reference audio is a straightforward approach for multi-emotion speech synthesis in encoder-decoder systems. But how to get better emotion embedding and how to inject it into TTS acoustic model more effectively are still under investigation. In this paper, we propose an innovative constraint to help VAE extract emotion embedding with better cluster cohesion. Besides, the obtained emotion embedding is used as query to aggregate latent representations of all encoder layers via attention. Moreover, the queries from encoder layers themselves are also helpful. Experiments prove the proposed methods can enhance the encoding of comprehensive syntactic and semantic information and produce more expressive emotional speech.

preprint2022arXiv

Learning Decoupling Features Through Orthogonality Regularization

Keyword spotting (KWS) and speaker verification (SV) are two important tasks in speech applications. Research shows that the state-of-art KWS and SV models are trained independently using different datasets since they expect to learn distinctive acoustic features. However, humans can distinguish language content and the speaker identity simultaneously. Motivated by this, we believe it is important to explore a method that can effectively extract common features while decoupling task-specific features. Bearing this in mind, a two-branch deep network (KWS branch and SV branch) with the same network structure is developed and a novel decoupling feature learning method is proposed to push up the performance of KWS and SV simultaneously where speaker-invariant keyword representations and keyword-invariant speaker representations are expected respectively. Experiments are conducted on Google Speech Commands Dataset (GSCD). The results demonstrate that the orthogonality regularization helps the network to achieve SOTA EER of 1.31% and 1.87% on KWS and SV, respectively.

preprint2022arXiv

Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information

In multi-speaker speech synthesis, data from a number of speakers usually tend to have great diversity due to the fact that the speakers may differ largely in ages, speaking styles, emotions, and so on. It is important but challenging to improve the modeling capabilities for multi-speaker speech synthesis. To address the issue, this paper proposes a high-capability speech synthesis system, called Msdtron, in which 1) a representation of the harmonic structure of speech, called excitation spectrogram, is designed to directly guide the learning of harmonics in mel-spectrogram. 2) conditional gated LSTM (CGLSTM) is proposed to control the flow of text content information through the network by re-weighting the gates of LSTM using speaker information. The experiments show a significant reduction in reconstruction error of mel-spectrogram in the training of the multi-speaker model, and a great improvement is observed in the subjective evaluation of speaker adapted model.

preprint2022arXiv

PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control

Sequence expansion between encoder and decoder is a critical challenge in sequence-to-sequence TTS. Attention-based methods achieve great naturalness but suffer from unstable issues like missing and repeating phonemes, not to mention accurate duration control. Duration-informed methods, on the contrary, seem to easily adjust phoneme duration but show obvious degradation in speech naturalness. This paper proposes PAMA-TTS to address the problem. It takes the advantage of both flexible attention and explicit duration models. Based on the monotonic attention mechanism, PAMA-TTS also leverages token duration and relative position of a frame, especially countdown information, i.e. in how many future frames the present phoneme will end. They help the attention to move forward along the token sequence in a soft but reliable control. Experimental results prove that PAMA-TTS achieves the highest naturalness, while has on-par or even better duration controllability than the duration-informed model.

preprint2022arXiv

Pseudo strong labels for large scale weakly supervised audio tagging

Large-scale audio tagging datasets inevitably contain imperfect labels, such as clip-wise annotated (temporally weak) tags with no exact on- and offsets, due to a high manual labeling cost. This work proposes pseudo strong labels (PSL), a simple label augmentation framework that enhances the supervision quality for large-scale weakly supervised audio tagging. A machine annotator is first trained on a large weakly supervised dataset, which then provides finer supervision for a student model. Using PSL we achieve an mAP of 35.95 balanced train subset of Audioset using a MobileNetV2 back-end, significantly outperforming approaches without PSL. An analysis is provided which reveals that PSL mitigates missing labels. Lastly, we show that models trained with PSL are also superior at generalizing to the Freesound datasets (FSD) than their weakly trained counterparts.

preprint2021arXiv

AutoKWS: Keyword Spotting with Differentiable Architecture Search

Smart audio devices are gated by an always-on lightweight keyword spotting program to reduce power consumption. It is however challenging to design models that have both high accuracy and low latency for accurate and fast responsiveness. Many efforts have been made to develop end-to-end neural networks, in which depthwise separable convolutions, temporal convolutions, and LSTMs are adopted as building units. Nonetheless, these networks designed with human expertise may not achieve an optimal trade-off in an expansive search space. In this paper, we propose to leverage recent advances in differentiable neural architecture search to discover more efficient networks. Our searched model attains 97.2% top-1 accuracy on Google Speech Command Dataset v1 with only nearly 100K parameters.

preprint2020arXiv

Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis

Attention-based seq2seq text-to-speech systems, especially those use self-attention networks (SAN), have achieved state-of-art performance. But an expressive corpus with rich prosody is still challenging to model as 1) prosodic aspects, which span across different sentential granularities and mainly determine acoustic expressiveness, are difficult to quantize and label and 2) the current seq2seq framework extracts prosodic information solely from a text encoder, which is easily collapsed to an averaged expression for expressive contents. In this paper, we propose a context extractor, which is built upon SAN-based text encoder, to sufficiently exploit the sentential context over an expressive corpus for seq2seq-based TTS. Our context extractor first collects prosodic-related sentential context information from different SAN layers and then aggregates them to learn a comprehensive sentence representation to enhance the expressiveness of the final generated speech. Specifically, we investigate two methods of context aggregation: 1) direct aggregation which directly concatenates the outputs of different SAN layers, and 2) weighted aggregation which uses multi-head attention to automatically learn contributions for different SAN layers. Experiments on two expressive corpora show that our approach can produce more natural speech with much richer prosodic variations, and weighted aggregation is more superior in modeling expressivity.

preprint2016arXiv

Heteronuclear Efimov scenario with positive intraspecies scattering length

We investigate theoretically and experimentally the heteronuclear Efimov scenario for a three-body system that consists of two bosons and one distinguishable particle with positive intraspecies scattering lengths. The three-body parameter at the three-body scattering threshold and the scaling factor between consecutive Efimov resonances are found to be controlled by the scattering length between the two bosons, approximately independent of short-range physics. We observe two excited-state Efimov resonances in the three-body recombination spectra of an ultracold mixture of fermionic $^6 $Li and bosonic $^{133} $Cs atoms close to a Li-Cs Feshbach resonance, where the Cs-Cs interaction is positive. Deviation of the obtained scaling factor of 4.0(3) from the universal prediction of 4.9 and the absence of the ground state Efimov resonance shed new light on the interpretation of the universality and the discrete scaling behavior of heteronuclear Efimov physics.

preprint2014arXiv

Few-body physics of ultracold atoms and molecules with long-range interactions

The quantum mechanical few-body problem at ultracold energies poses severe challenges to theoretical techniques, particularly when long-range interactions are present that decay only as a power-law potential. In this paper we review the techniques and progress in studies of universal few-body physics for ultracold atoms, particularly those related to long-range interactions.

preprint2014arXiv

Universal van der Waals Physics for Three Ultracold Atoms

Experimental studies with ultracold atoms have enabled major breakthroughs in understanding three-body physics, historically a fundamental yet challenging problem. This is because the interactions among ultracold atoms can be precisely varied using magnetically tunable scattering resonances known as Feshbach resonances. The collisions of ultracold atoms have been discovered to have many universal aspects near the unitarity limit. Away from this limit, many quantum states are expected to be active during a three-body collision, making the collisional observables practically unpredictable. Here we report a major development in predicting three-body ultracold scattering rates by properly building in the pairwise van der Waals interactions plus the multi-spin properties of a tunable Feshbach resonance state characterized by two known dimensionless two-body parameters. Numerical solution of the Schr{ö}dinger equation then predicts the three-atom collisional rates without adjustable fitting parameters needed to fit data. Our calculations show quantitative agreement in magnitude and feature position and shape across the full range of tuning of measured rate coefficients for three-body recombination and atom-dimer collisions involving ultracold Cs atoms.

preprint2013arXiv

Ultracold mixtures of atomic Li-6 and Cs-133 with tunable interactions

We report the experimental and theoretical study of two-body interactions in a $^{6}$Li-$^{133}$Cs Fermi- Bose mixture. Using a translatable dipole trap setup, we have successfully trapped the two species in the same trap with temperatures of a few microkelvins. By monitoring atom number loss and inter-species thermalization, we identify five s-wave interspecies Feshbach resonances in the lowest two scattering channels. We construct a coupled channels model using molecular potentials to fit and characterize these resonances. Two of the resonances are as wide as 60 G and thus should be suitable for creating Feshbach molecules and searching for universal few-body scaling.

preprint2012arXiv

Universal three-body parameter in heteronuclear atomic systems

In Efimov physics, a three-body parameter (3BP), previously regarded as nonuniversal, uniquely defines bound and scattering properties of three particles. A universal 3BP, however, have been recently shown in experiments and theory in ultracold homonuclear gases. Our present study further predicts a universal 3BP for heteronuclear atomic systems near broad Feshbach resonances, and provides physical interpretations for its origin. We show that for a system composed of two light and one heavy atoms, the physical origin of the universal 3BP is similar to the homonuclear case while for systems composed by one light and two heavy atoms the universality of the 3BP is now mostly controlled by the heavy-heavy interatomic properties. This new universality is explained by the universal properties of the van der Waals interactions in a simple Born-Oppenheimer (BO) picture. Finally, we show the numerically determined 3BPs for some combinations of alkali atoms used in ultracold experiments.

preprint2012arXiv

Universal three-body recombination via resonant d-wave interactions

For a system of three identical bosons interacting via short-range forces, when two of the atoms are about to form a two-body s-wave dimer, there exists an infinite number of three-body bound states. This effect is the well-known Efimov effect. These three-body states (Efimov states) are found to be universal for ultracold atomic gases and the lowest Efimov state crosses the three-body break-up threshold when the s-wave two-body scattering length is $a \approx -9.73 r_{\rm vdW}$, $r_{\rm vdW}$ being the van der Waals length. This article focuses on a generalized version of this Efimov scenario, where two of the atoms are about to form a two-body d-wave dimer, which leads to strong d-wave interactions. In a recent paper [B. Gao, Phys. Rev. A. {\bf 62}, 050702(R) (2000)], Bo Gao has predicted that for broad resonances the d-wave dimer is always formed near $a \approx 0.956 r_{\rm vdW}$. Here we find that a single universal three-body state associated with the d-wave dimer is also formed near the three-body break-up threshold at $a \approx 1.09 r_{\rm vdW}$ and its signature can be found through enhancement of the three-body recombination. The three-body effective potential curves that are crucial for understanding the recombination dynamics are also calculated and analyzed. An improved method to calculate the couplings, effective potential curves, and recombination rate coefficients is presented.

preprint2011arXiv

A new class of three-body states

We calculate the three-body spectrum for identical bosons interacting via attractive $1/r^2$ potentials. We have found an infinite number of three-body states even when the pair interactions are too weak to support any two-body states. These new states thus share this surprising scenario with the Efimov effect, but are not themselves Efimov states. Our effect occurs for both identical bosons and identical fermions, and it persists in the presence of two-body bound states.

preprint2011arXiv

Efimov physics in heteronuclear four-body systems

We study three- and four-body Efimov physics in a heteronuclear atomic system with three identical heavy bosonic atoms and one light atom. We show that exchange of the light atom between the heavy atoms leads to both three- and four-body features in the low-energy inelastic rate constants that trace to the Efimov effect. Further, the effective interaction generated by this exchange can provide an additional mechanism for control in ultracold experiments. Finally, we find that there is no true four-body Efimov effect - that is, no infinite number of four-body states in the absence of two- and three-body bound states - resolving a decades-long controversy.

preprint2011arXiv

The Efimov effect for three interacting bosonic dipoles

Three oriented bosonic dipoles are treated using the hyperspherical adiabatic representation, providing numerical evidence that the Efimov effect persists near a two-dipole resonance and in a system where angular momentum is not conserved. Our results further show that the Efimov features in scattering observables become universal, with a known three-body parameter, i.e. the resonance energies depend only on the two-body physics, which also has implications for the universal spectrum of the four-dipole problem. Moreover, the Efimov states should be long-lived, which is favorable for their creation and manipulation in ultracold dipolar gases. Finally, deeply-bound two-dipole states are shown to be relatively stable against collisions with a third dipole, owing to the emergence of a repulsive interaction originating in the angular momentum nonconservation for this system.

preprint2011arXiv

Universal bound and scattering properties for two dipoles

The bound state and low-energy scattering properties of two oriented dipoles are investigated for both bosonic and fermionic symmetries. Interestingly, a universal scaling emerges for the expectation value of the angular momentum for deeply-bound two-dipole states. This scaling traces to the pendulum motion of two dipoles in strong dipole regime. The low-energy scattering phase shifts of two dipoles also show universal behavior. These universal observations make connections to the scaling laws reported in Refs. [1, 2] for three dipoles. Atomic units are used throughout this work.

preprint2011arXiv

Universal three-body physics for fermionic dipoles

A study of the universal physics for three oriented fermionic dipoles in the hyperspherical adiabatic representation predicts a single long-lived three-dipole state, which exists in only one three-body symmetry, should form near a two-dipole resonance. Our analysis reveals the spatial configuration of the universal state, and the scaling of its binding energy and lifetime with the strength of the dipolar interaction. In addition, three-body recombination of fermionic dipoles is found to be important even at ultracold energies. An additional finding is that an effective long-range repulsion arises between a dipole and a dipolar dimer that is tunable via dipolar interactions.

preprint2010arXiv

Adiabatic Floquet Picture for Hydrogen Atom in an Intense Laser Field

We develop an adiabatic Floquet picture in the length gauge to describe the dynamics of a hydrogen atom in an intense laser field. In this picture, we discuss the roles played by frequency and intensity in terms of adiabatic potentials and the couplings between them, which gives a physical and intuitive picture for quantum systems exposed to a laser field. For simplicity, analyze hydrogen and give the adiabatic potential curves as well as some physical quantities that can be readily calculated for the ground state. Both linearly and circularly polarized laser fields are discussed.

preprint2010arXiv

Cold three-body collisions in hydrogen-hydrogen-alkali atomic system

We have studied hydrogen-hydrogen-alkali three-body systems in the adiabatic hyperspherical representation. For the spin-stretched case, there exists a single $X$H molecular state when $X$ is one of the bosonic alkali atoms: $^7$Li, $^{23}$Na, $^{39}$K, $^{87}$Rb and $^{133}$Cs. As a result, the {\em only} recombination process is the one that leads to formation of $X$H molecules, H+H+$X$$\rightarrow$$X$H+H, and such molecules will be stable against vibrational relaxation. We have calculated the collision rates for recombination and collision induced dissociation as well as the elastic cross-sections for H+$X$H collisions up to a temperature of 0.5 K, including the partial wave contributions from $J^Π$=$0^+$ to $5^-$. We have also found that there is just one three-body bound state for such systems for $J^Π$=$0^+$ and no bound states for higher angular momenta.

preprint2010arXiv

Ultracold three-body collisions near narrow Feshbach resonances

We study ultracold three-body collisions of bosons and fermions when the interatomic interaction is tuned near a narrow Feshbach resonance. We show that the width of the resonance has a substantial impact on the collisional properties of ultracold gases in the strongly interacting regime. We obtain numerical and analytical results that allow us to identify universal features related to the resonance width. For narrow resonances, we have found a suppression of all inelastic processes in boson systems leading to deeply bound states and an enhancement for fermion systems.

preprint2010arXiv

Universal three-body physics at finite energy near Feshbach resonances

We find that universal three-body physics extends beyond the threshold regime to non-zero energies. For ultracold atomic gases with a negative two-body $s$-wave scattering length near a Feshbach resonance, we show the resonant peaks characteristic of Efimov physics persist in three-body recombination to higher collision energies. For this and other inelastic processes, we use the adiabatic hyperspherical representation to derive universal analytical expressions for their dependence on the scattering length, the collision energy, and --- for narrow resonances --- the effective range. These expressions are supported by full numerical solutions of the Schrödinger equation and display log-periodic dependence on energy characteristic of Efimov physics. This dependence is robust and might be used to experimentally observe several Efimov features.

YuJun Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

23 published item(s)

Detect what you want: Target Sound Detection

Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation

Learning Decoupling Features Through Orthogonality Regularization

Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information

PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control

Pseudo strong labels for large scale weakly supervised audio tagging

AutoKWS: Keyword Spotting with Differentiable Architecture Search

Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis

Heteronuclear Efimov scenario with positive intraspecies scattering length

Few-body physics of ultracold atoms and molecules with long-range interactions

Universal van der Waals Physics for Three Ultracold Atoms

Ultracold mixtures of atomic Li-6 and Cs-133 with tunable interactions

Universal three-body parameter in heteronuclear atomic systems

Universal three-body recombination via resonant d-wave interactions

A new class of three-body states

Efimov physics in heteronuclear four-body systems

The Efimov effect for three interacting bosonic dipoles

Universal bound and scattering properties for two dipoles

Universal three-body physics for fermionic dipoles

Adiabatic Floquet Picture for Hydrogen Atom in an Intense Laser Field

Cold three-body collisions in hydrogen-hydrogen-alkali atomic system

Ultracold three-body collisions near narrow Feshbach resonances

Universal three-body physics at finite energy near Feshbach resonances