Researcher profile

Yutian Wang

Yutian Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

Atomic-scale Deformation Process of Glasses Unveiled by Stress-induced Structural Anisotropy

Experimentally resolving atomic-scale structural changes of a deformed glass remains challenging owing to the disordered nature of glass structure. Here, we show that the structural anisotropy emerges as a general hallmark for different types of glasses (metallic glasses, oxide glass, amorphous selenium, and polymer glass) after thermo-mechanical deformation, and it is highly correlates with local nonaffine atomic displacements detected by the high-energy X-ray diffraction technique. By analyzing the anisotropic pair density function, we unveil the atomic-level mechanism responsible for the plastic flow, which notably differs between metallic glasses and covalent glasses. The structural rearrangements in metallic glasses are mediated through cutting and formation of atomic bonds, which occurs in some localized inelastic regions embedded in elastic matrix, whereas that of covalent glasses is mediated through the rotation of atomic bonds or chains without bond length change, which occurs in a less localized manner.

preprint2022arXiv

DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement

The decoupling-style concept begins to ignite in the speech enhancement area, which decouples the original complex spectrum estimation task into multiple easier sub-tasks i.e., magnitude-only recovery and the residual complex spectrum estimation)}, resulting in better performance and easier interpretability. In this paper, we propose a dual-branch federative magnitude and phase estimation framework, dubbed DBT-Net, for monaural speech enhancement, aiming at recovering the coarse- and fine-grained regions of the overall spectrum in parallel. From the complementary perspective, the magnitude estimation branch is designed to filter out dominant noise components in the magnitude domain, while the complex spectrum purification branch is elaborately designed to inpaint the missing spectral details and implicitly estimate the phase information in the complex-valued spectral domain. To facilitate the information flow between each branch, interaction modules are introduced to leverage features learned from one branch, so as to suppress the undesired parts and recover the missing components of the other branch. Instead of adopting the conventional RNNs and temporal convolutional networks for sequence modeling, we employ a novel attention-in-attention transformer-based network within each branch for better feature learning. More specially, it is composed of several adaptive spectro-temporal attention transformer-based modules and an adaptive hierarchical attention module, aiming to capture long-term time-frequency dependencies and further aggregate intermediate hierarchical contextual information. Comprehensive evaluations on the WSJ0-SI84 + DNS-Challenge and VoiceBank + DEMAND dataset demonstrate that the proposed approach consistently outperforms previous advanced systems and yields state-of-the-art performance in terms of speech quality and intelligibility.

preprint2022arXiv

Deep BSDE-ML Learning and Its Application to Model-Free Optimal Control

A modified Deep BSDE (backward differential equation) learning method with measurability loss, called Deep BSDE-ML method, is introduced in this paper to solve a kind of linear decoupled forward-backward stochastic differential equations (FBSDEs), which is encountered in the policy evaluation of learning the optimal feedback policies of a class of stochastic control problems. The measurability loss is characterized via the measurability of BSDE's state at the forward initial time, which differs from that related to terminal state of the known Deep BSDE method. Though the minima of the two loss functions are shown to be equal, this measurability loss is proved to be equal to the expected mean squared error between the true diffusion term of BSDE and its approximation. This crucial observation extends the application of the Deep BSDE method -- approximating the gradients of the solution of a partial differential equation (PDE) instead of the solution itself. Simultaneously, a learning-based framework is introduced to search an optimal feedback control of a deterministic nonlinear system. Specifically, by introducing Gaussian exploration noise, we are aiming to learn a robust optimal controller under this stochastic case. This reformulation sacrifices the optimality to some extent, but as suggested in reinforcement learning (RL) exploration noise is essential to enable the model-free learning.

preprint2022arXiv

Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement

Curriculum learning begins to thrive in the speech enhancement area, which decouples the original spectrum estimation task into multiple easier sub-tasks to achieve better performance. Motivated by that, we propose a dual-branch attention-in-attention transformer dubbed DB-AIAT to handle both coarse- and fine-grained regions of the spectrum in parallel. From a complementary perspective, a magnitude masking branch is proposed to coarsely estimate the overall magnitude spectrum, and simultaneously a complex refining branch is elaborately designed to compensate for the missing spectral details and implicitly derive phase information. Within each branch, we propose a novel attention-in-attention transformer-based module to replace the conventional RNNs and temporal convolutional networks for temporal sequence modeling. Specifically, the proposed attention-in-attention transformer consists of adaptive temporal-frequency attention transformer blocks and an adaptive hierarchical attention module, aiming to capture long-term temporal-frequency dependencies and further aggregate global hierarchical contextual information. Experimental results on Voice Bank + DEMAND demonstrate that DB-AIAT yields state-of-the-art performance (e.g., 3.31 PESQ, 95.6% STOI and 10.79dB SSNR) over previous advanced systems with a relatively small model size (2.81M).

preprint2022arXiv

Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement

For the lack of adequate paired noisy-clean speech corpus in many real scenarios, non-parallel training is a promising task for DNN-based speech enhancement methods. However, because of the severe mismatch between input and target speeches, many previous studies only focus on the magnitude spectrum estimation and remain the phase unaltered, resulting in the degraded speech quality under low signal-to-noise ratio conditions. To tackle this problem, we decouple the difficult target w.r.t. original spectrum optimization into spectral magnitude and phase, and a novel Cycle-in-Cycle generative adversarial network (dubbed CinCGAN) is proposed to jointly estimate the spectral magnitude and phase information stage by stage under unpaired data. In the first stage, we pretrain a magnitude CycleGAN to coarsely estimate the spectral magnitude of clean speech. In the second stage, we incorporate the pretrained CycleGAN with a complex-valued CycleGAN as a cycle-in-cycle structure to simultaneously recover phase information and refine the overall spectrum. Experimental results demonstrate that the proposed approach significantly outperforms previous baselines under non-parallel training. The evaluation on training the models with standard paired data also shows that CinCGAN achieves remarkable performance especially in reducing background noise and speech distortion.

preprint2022arXiv

Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement

Due to the high computational complexity to model more frequency bands, it is still intractable to conduct real-time full-band speech enhancement based on deep neural networks. Recent studies typically utilize the compressed perceptually motivated features with relatively low frequency resolution to filter the full-band spectrum by one-stage networks, leading to limited speech quality improvements. In this paper, we propose a coordinated sub-band fusion network for full-band speech enhancement, which aims to recover the low- (0-8 kHz), middle- (8-16 kHz), and high-band (16-24 kHz) in a step-wise manner. Specifically, a dual-stream network is first pretrained to recover the low-band complex spectrum, and another two sub-networks are designed as the middle- and high-band noise suppressors in the magnitude-only domain. To fully capitalize on the information intercommunication, we employ a sub-band interaction module to provide external knowledge guidance across different frequency bands. Extensive experiments show that the proposed method yields consistent performance advantages over state-of-the-art full-band baselines.

preprint2022arXiv

Unsupervised Quantized Prosody Representation for Controllable Speech Synthesis

In this paper, we propose a novel prosody disentangle method for prosodic Text-to-Speech (TTS) model, which introduces the vector quantization (VQ) method to the auxiliary prosody encoder to obtain the decomposed prosody representations in an unsupervised manner. Rely on its advantages, the speaking styles, such as pitch, speaking velocity, local pitch variance, etc., are decomposed automatically into the latent quantize vectors. We also investigate the internal mechanism of VQ disentangle process by means of a latent variables counter and find that higher value dimensions usually represent prosody information. Experiments show that our model can control the speaking styles of synthesis results by directly manipulating the latent variables. The objective and subjective evaluations illustrated that our model outperforms the popular models.

preprint2020arXiv

Soliton Distillation in Fiber Lasers

Pure solitons are for the first time distilled from the resonant continuous wave (CW) background in a fiber laser by utilizing nonlinear Fourier transform (NFT). It is identified that the soliton and the resonant CW background have different eigenvalue distributions in the nonlinear frequency domain. Similar to water distillation, we propose the approach of soliton distillation, by making NFT on a steady pulse generated from a fiber laser, then filtering out the eigenvalues of the resonant CW background in the nonlinear frequency domain, and finally recovering the soliton by inverse NFT (INFT). Simulation results verify that the soliton can be distinguished from the resonant CW background in the nonlinear frequency domain and pure solitons can be obtained by INFT.