Source author record

Hao-Wen Dong

Hao-Wen Dong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound cond-mat.mtrl-sci Machine Learning physics.app-ph Artificial Intelligence Multimedia

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments

Choral music separation refers to the task of extracting tracks of voice parts (e.g., soprano, alto, tenor, and bass) from mixed audio. The lack of datasets has impeded research on this topic as previous work has only been able to train and evaluate models on a few minutes of choral music data due to copyright issues and dataset collection difficulties. In this paper, we investigate the use of synthesized training data for the source separation task on real choral music. We make three contributions: first, we provide an automated pipeline for synthesizing choral music data from sampled instrument plugins within controllable options for instrument expressiveness. This produces an 8.2-hour-long choral music dataset from the JSB Chorales Dataset and one can easily synthesize additional data. Second, we conduct an experiment to evaluate multiple separation models on available choral music separation datasets from previous work. To the best of our knowledge, this is the first experiment to comprehensively evaluate choral music separation. Third, experiments demonstrate that the synthesized choral data is of sufficient quality to improve the model's performance on real choral music datasets. This provides additional experimental statistics and data support for the choral music separation study.

preprint2020arXiv

Achromatic metasurfaces with inversely customized dispersion for ultra-broadband acoustic beam engineering

Metasurfaces, the ultrathin media with extraordinary wavefront modulation ability, have shown versatile potential in manipulating waves. However, existing acoustic metasurfaces are limited by their narrow-band frequency-dependent capability, which severely hinders their real-world applications that usually require customized dispersion. To address this bottlenecking challenge, we report ultra-broadband achromatic metasurfaces that are capable of delivering arbitrary and frequency-independent wave properties by bottom-up topology optimization. We successively demonstrate three ultra-broadband functionalities, including acoustic beam steering, focusing and levitation, featuring record-breaking relative bandwidths of 93.3%, 120% and 118.9%, respectively. All metasurface elements show novel asymmetric geometries containing multiple scatters, curved air channels and local cavities. Moreover, we reveal that the inversely designed metasurfaces can support integrated internal resonances, bi-anisotropy and multiple scattering, which collectively form the mechanism underpinning the ultra-broadband customized dispersion. Our study opens new horizons for ultra-broadband high-efficiency achromatic functional devices on demand, with promising extension to the optical and elastic achromatic metamaterials.

preprint2020arXiv

MusPy: A Toolkit for Symbolic Music Generation

In this paper, we present MusPy, an open source Python library for symbolic music generation. MusPy provides easy-to-use tools for essential components in a music generation system, including dataset management, data I/O, data preprocessing and model evaluation. In order to showcase its potential, we present statistical analysis of the eleven datasets currently supported by MusPy. Moreover, we conduct a cross-dataset generalizability experiment by training an autoregressive model on each dataset and measuring held-out likelihood on the others---a process which is made easier by MusPy's dataset management system. The results provide a map of domain overlap between various commonly used datasets and show that some datasets contain more representative cross-genre samples than others. Along with the dataset analysis, these results might serve as a guide for choosing datasets in future research. Source code and documentation are available at https://github.com/salu133445/muspy .

preprint2019arXiv

Robust 3D multi-polar acoustic metamaterials with broadband double negativity

Acoustic negative-index metamaterials show promise in achieving superlensing for diagnostic medical imaging. In spite of the recent progress made in this field, most metamaterials suffer from deficiencies such as low spatial symmetry, sophisticated labyrinth topologies and narrow-band features, which make them difficult to be utilized for symmetric subwavelength imaging applications. Here, we propose a category of robust multi-cavity metamaterials and reveal their common double-negative mechanism enabled by multi-polar (dipole, quadrupole and octupole) resonances in both two-dimensional (2D) and three-dimensional (3D) scenarios. In particular, we discover explicit relationships governing the double-negative frequency bounds from equivalent circuit analogy. Moreover, broadband single-source and double-source subwavelength imaging is realized and verified by 2D and 3D superlens. More importantly, the analogical 3D superlens can ensure the subwavelength imaging in all directions. The proposed multi-polar resonance-enabled robust metamaterials and design methodology open horizons for easier manipulation of subwavelength waves and realization of practical 3D metamaterial devices.

preprint2017arXiv

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/ .