Source author record

Xiangdong Wang

Xiangdong Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Machine Learning Sound cond-mat.mtrl-sci Human-Computer Interaction nlin.PS physics.optics

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Couple Learning for semi-supervised sound event detection

The recently proposed Mean Teacher method, which exploits large-scale unlabeled data in a self-ensembling manner, has achieved state-of-the-art results in several semi-supervised learning benchmarks. Spurred by current achievements, this paper proposes an effective Couple Learning method that combines a well-trained model and a Mean Teacher model. The suggested pseudo-labels generated model (PLG) increases strongly- and weakly-labeled data to improve the Mean Teacher method-s performance. Moreover, the Mean Teacher-s consistency cost reduces the noise impact in the pseudo-labels introduced by detection errors. The experimental results on Task 4 of the DCASE2020 challenge demonstrate the superiority of the proposed method, achieving about 44.25% F1-score on the public evaluation set, significantly outperforming the baseline system-s 32.39%. At the same time, we also propose a simple and effective experiment called the Variable Order Input (VOI) experiment, which proves the significance of the Couple Learning method. Our developed Couple Learning code is available on GitHub.

preprint2022arXiv

Photonic p-orbital higher-order topological insulators

The orbital degrees of freedom play a pivotal role in understanding fundamental phenomena in solid-state materials as well as exotic quantum states of matter including orbital superfluidity and topological semimetals. Despite tremendous efforts in engineering synthetic cold-atom, electronic and photonic lattices to explore orbital physics, thus far high orbitals in an important class of materials, namely, the higher-order topological insulators (HOTIs), have not been realized. Here, we demonstrate p-orbital corner states in a photonic HOTI, unveiling their underlying topological invariant, symmetry protection, and nonlinearity-induced dynamical rotation. In a Kagome-type HOTI, we find that topological protection of the p-orbital corner states demands an orbital-hopping symmetry, in addition to the generalized chiral symmetry. Due to orbital hybridization, the nontrivial topology of the p-orbital HOTI is hidden if bulk polarization is used as the topological invariant, but well manifested by the generalized winding number. Our work opens a pathway for the exploration of intriguing orbital phenomena mediated by higher band topology applicable to a broad spectrum of systems.

preprint2022arXiv

SP-SEDT: Self-supervised Pre-training for Sound Event Detection Transformer

Recently, an event-based end-to-end model (SEDT) has been proposed for sound event detection (SED) and achieves competitive performance. However, compared with the frame-based model, it requires more training data with temporal annotations to improve the localization ability. Synthetic data is an alternative, but it suffers from a great domain gap with real recordings. Inspired by the great success of UP-DETR in object detection, we propose to self-supervisedly pre-train SEDT (SP-SEDT) by detecting random patches (only cropped along the time axis). Experiments on the DCASE2019 task4 dataset show the proposed SP-SEDT can outperform fine-tuned frame-based model. The ablation study is also conducted to investigate the impact of different loss functions and patch size.

preprint2020arXiv

Guided learning for weakly-labeled semi-supervised sound event detection

We propose a simple but efficient method termed Guided Learning for weakly-labeled semi-supervised sound event detection (SED). There are two sub-targets implied in weakly-labeled SED: audio tagging and boundary detection. Instead of designing a single model by considering a trade-off between the two sub-targets, we design a teacher model aiming at audio tagging to guide a student model aiming at boundary detection to learn using the unlabeled data. The guidance is guaranteed by the audio tagging performance gap of the two models. In the meantime, the student model liberated from the trade-off is able to provide more excellent boundary detection results. We propose a principle to design such two models based on the relation between the temporal compression scale and the two sub-targets. We also propose an end-to-end semi-supervised learning process for these two models to enable their abilities to rise alternately. Experiments on the DCASE2018 Task4 dataset show that our approach achieves competitive performance.

preprint2020arXiv

Multi-Branch Learning for Weakly-Labeled Sound Event Detection

There are two sub-tasks implied in the weakly-supervised SED: audio tagging and event boundary detection. Current methods which combine multi-task learning with SED requires annotations both for these two sub-tasks. Since there are only annotations for audio tagging available in weakly-supervised SED, we design multiple branches with different learning purposes instead of pursuing multiple tasks. Similar to multiple tasks, multiple different learning purposes can also prevent the common feature which the multiple branches share from overfitting to any one of the learning purposes. We design these multiple different learning purposes based on combinations of different MIL strategies and different pooling methods. Experiments on the DCASE 2018 Task 4 dataset and the URBAN-SED dataset both show that our method achieves competitive performance.

preprint2020arXiv

Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection

In this paper, a special decision surface for the weakly-supervised sound event detection (SED) and a disentangled feature (DF) for the multi-label problem in polyphonic SED are proposed. We approach SED as a multiple instance learning (MIL) problem and utilize a neural network framework with a pooling module to solve it. General MIL approaches include two kinds: the instance-level approaches and embedding-level approaches. We present a method of generating instance-level probabilities for the embedding level approaches which tend to perform better than the instance-level approaches in terms of bag-level classification but can not provide instance-level probabilities in current approaches. Moreover, we further propose a specialized decision surface (SDS) for the embedding-level attention pooling. We analyze and explained why an embedding-level attention module with SDS is better than other typical pooling modules from the perspective of the high-level feature space. As for the problem of the unbalanced dataset and the co-occurrence of multiple categories in the polyphonic event detection task, we propose a DF to reduce interference among categories, which optimizes the high-level feature space by disentangling it based on class-wise identifiable information and obtaining multiple different subspaces. Experiments on the dataset of DCASE 2018 Task 4 show that the proposed SDS and DF significantly improve the detection performance of the embedding-level MIL approach with an attention pooling module and outperform the first place system in the challenge by 6.6 percentage points.

Xiangdong Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Couple Learning for semi-supervised sound event detection

Photonic p-orbital higher-order topological insulators

SP-SEDT: Self-supervised Pre-training for Sound Event Detection Transformer

Guided learning for weakly-labeled semi-supervised sound event detection

Multi-Branch Learning for Weakly-Labeled Sound Event Detection

Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection