Researcher profile

Xiongwei Zhang

Xiongwei Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

A New Adjacency Matrix Configuration in GCN-based Models for Skeleton-based Action Recognition

Human skeleton data has received increasing attention in action recognition due to its background robustness and high efficiency. In skeleton-based action recognition, graph convolutional network (GCN) has become the mainstream method. This paper analyzes the fundamental factor for GCN-based models -- the adjacency matrix. We notice that most GCN-based methods conduct their adjacency matrix based on the human natural skeleton structure. Based on our former work and analysis, we propose that the human natural skeleton structure adjacency matrix is not proper for skeleton-based action recognition. We propose a new adjacency matrix that abandons all rigid neighbor connections but lets the model adaptively learn the relationships of joints. We conduct extensive experiments and analysis with a validation model on two skeleton-based action recognition datasets (NTURGBD60 and FineGYM). Comprehensive experimental results and analysis reveals that 1) the most widely used human natural skeleton structure adjacency matrix is unsuitable in skeleton-based action recognition; 2) The proposed adjacency matrix is superior in model performance, noise robustness and transferability.

preprint2021arXiv

Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach

Traditional low bit-rate speech coding approach only handles narrowband speech at 8kHz, which limits further improvements in speech quality. Motivated by recent successful exploration of deep learning methods for image and speech compression, this paper presents a new approach through vector quantization (VQ) of mel-frequency cepstral coefficients (MFCCs) and using a deep generative model called WaveGlow to provide efficient and high-quality speech coding. The coding feature is sorely an 80-dimension MFCCs vector for 16kHz wideband speech, then speech coding at the bit-rate throughout 1000-2000 bit/s could be scalably implemented by applying different VQ schemes for MFCCs vector. This new deep generative network based codec works fast as the WaveGlow model abandons the sample-by-sample autoregressive mechanism. We evaluated this new approach over the multi-speaker TIMIT corpus, and experimental results demonstrate that it provides better speech quality compared with the state-of-the-art classic MELPe codec at lower bit-rate.

preprint2020arXiv

When Automatic Voice Disguise Meets Automatic Speaker Verification

The technique of transforming voices in order to hide the real identity of a speaker is called voice disguise, among which automatic voice disguise (AVD) by modifying the spectral and temporal characteristics of voices with miscellaneous algorithms are easily conducted with softwares accessible to the public. AVD has posed great threat to both human listening and automatic speaker verification (ASV). In this paper, we have found that ASV is not only a victim of AVD but could be a tool to beat some simple types of AVD. Firstly, three types of AVD, pitch scaling, vocal tract length normalization (VTLN) and voice conversion (VC), are introduced as representative methods. State-of-the-art ASV methods are subsequently utilized to objectively evaluate the impact of AVD on ASV by equal error rates (EER). Moreover, an approach to restore disguised voice to its original version is proposed by minimizing a function of ASV scores w.r.t. restoration parameters. Experiments are then conducted on disguised voices from Voxceleb, a dataset recorded in real-world noisy scenario. The results have shown that, for the voice disguise by pitch scaling, the proposed approach obtains an EER around 7% comparing to the 30% EER of a recently proposed baseline using the ratio of fundamental frequencies. The proposed approach generalizes well to restore the disguise with nonlinear frequency warping in VTLN by reducing its EER from 34.3% to 18.5%. However, it is difficult to restore the source speakers in VC by our approach, where more complex forms of restoration functions or other paralinguistic cues might be necessary to restore the nonlinear transform in VC. Finally, contrastive visualization on ASV features with and without restoration illustrate the role of the proposed approach in an intuitive way.