Source author record

Tomohiko Nakamura

Tomohiko Nakamura appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Sound eess.AS Artificial Intelligence astro-ph.GA astro-ph.IM Machine Learning

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds

A differentiable digital signal processing (DDSP) autoencoder is a musical sound synthesizer that combines a deep neural network (DNN) and spectral modeling synthesis. It allows us to flexibly edit sounds by changing the fundamental frequency, timbre feature, and loudness (synthesis parameters) extracted from an input sound. However, it is designed for a monophonic harmonic sound and cannot handle mixtures of harmonic sounds. In this paper, we propose a model (DDSP mixture model) that represents a mixture as the sum of the outputs of multiple pretrained DDSP autoencoders. By fitting the output of the proposed model to the observed mixture, we can directly estimate the synthesis parameters of each source. Through synthesis parameter extraction experiments, we show that the proposed method has high and stable performance compared with a straightforward method that applies the DDSP autoencoder to the signals separated by an audio source separation method.

preprint2022arXiv

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

We present a self-supervised speech restoration method without paired speech corpora. Because the previous general speech restoration method uses artificial paired data created by applying various distortions to high-quality speech corpora, it cannot sufficiently represent acoustic distortions of real data, limiting the applicability. Our model consists of analysis, synthesis, and channel modules that simulate the recording process of degraded speech and is trained with real degraded speech data in a self-supervised manner. The analysis module extracts distortionless speech features and distortion features from degraded speech, while the synthesis module synthesizes the restored speech waveform, and the channel module adds distortions to the speech waveform. Our model also enables audio effect transfer, in which only acoustic distortions are extracted from degraded speech and added to arbitrary high-quality audio. Experimental evaluations with both simulated and real data show that our method achieves significantly higher-quality speech restoration than the previous supervised method, suggesting its applicability to real degraded speech materials.

preprint2015arXiv

ANIR : Atacama Near-Infrared Camera for the 1.0-m miniTAO Telescope

We have developed a near-infrared camera called ANIR (Atacama Near-InfraRed camera) for the University of Tokyo Atacama Observatory 1.0m telescope (miniTAO) installed at the summit of Cerro Chajnantor (5640 m above sea level) in northern Chile. The camera provides a field of view of 5'.1 $\times$ 5'.1 with a spatial resolution of 0".298 /pixel in the wavelength range of 0.95 to 2.4 $μ$m. Taking advantage of the dry site, the camera is capable of hydrogen Paschen-$α$ (Pa$α$, $λ=$1.8751 $μ$m in air) narrow-band imaging observations, at which wavelength ground-based observations have been quite difficult due to deep atmospheric absorption mainly from water vapor. We have been successfully obtaining Pa$α$ images of Galactic objects and nearby galaxies since the first-light observation in 2009 with ANIR. The throughputs at the narrow-band filters ($N1875$, $N191$) including the atmospheric absorption show larger dispersion (~10%) than those at broad-band filters (a few %), indicating that they are affected by temporal fluctuations in Precipitable Water Vapor (PWV) above the site. We evaluate the PWV content via the atmospheric transmittance at the narrow-band filters, and derive that the median and the dispersion of the distribution of the PWV are 0.40+/-0.30 mm for $N1875$ and 0.37+/-0.21 mm for $N191$, which are remarkably smaller (49+/-38% for $N1875$ and 59+/-26% for $N191$) than radiometry measurements at the base of Cerro Chajnantor (5100 m alt.). The decrease in PWV can be explained by the altitude of the site when we assume that the vertical distribution of the water vapor is approximated at an exponential profile with scale heights within 0.3-1.9 km (previously observed values at night). We thus conclude that miniTAO/ANIR at the summit of Cerro Chajnantor indeed provides us an excellent capability for a "ground-based" Pa$α$ observation.

preprint2014arXiv

Ground-based Pa$α$ Narrow-band Imaging of Local Luminous Infrared Galaxies I: Star Formation Rates and Surface Densities

Luminous infrared galaxies (LIRGs) are enshrouded by a large amount of dust, produced by their active star formation, and it is difficult to measure their activity in the optical wavelength. We have carried out Pa$α$ narrow-band imaging observations of 38 nearby star-forming galaxies including 33 LIRGs listed in $IRAS$ RBGS catalog with the Atacama Near InfraRed camera (ANIR) on the University of Tokyo Atacama Observatory (TAO) 1.0 m telescope (miniTAO). Star formation rates (SFRs) estimated from the Pa$α$ fluxes, corrected for dust extinction using the Balmer Decrement Method (typically $A_V$ $\sim$ 4.3 mag), show a good correlation with those from the bolometric infrared luminosity of $IRAS$ data within a scatter of 0.27 dex. This suggests that the correction of dust extinction for Pa$α$ flux is sufficient in our sample. We measure the physical sizes and the surface density of infrared luminosities ($Σ_{L(\mathrm{IR})}$) and $SFR$ ($Σ_{SFR}$) of star-forming region for individual galaxies, and find that most of the galaxies follow a sequence of local ultra luminous or luminous infrared galaxies (U/LIRGs) on the $L(\mathrm{IR})$-$Σ_{L(\mathrm{IR})}$ and $SFR$-$Σ_{SFR}$ plane. We confirm that a transition of the sequence from normal galaxies to U/LIRGs is seen at $L(\mathrm{IR})=8\times10^{10}$ $L_{\odot}$. Also, we find that there is a large scatter in physical size, different from those of normal galaxies or ULIRGs. Considering the fact that most of U/LIRGs are merging or interacting galaxies, this scatter may be caused by strong external factors or differences of their merging stage.

preprint2014arXiv

Outer-Product Hidden Markov Model and Polyphonic MIDI Score Following

We present a polyphonic MIDI score-following algorithm capable of following performances with arbitrary repeats and skips, based on a probabilistic model of musical performances. It is attractive in practical applications of score following to handle repeats and skips which may be made arbitrarily during performances, but the algorithms previously described in the literature cannot be applied to scores of practical length due to problems with large computational complexity. We propose a new type of hidden Markov model (HMM) as a performance model which can describe arbitrary repeats and skips including performer tendencies on distributed score positions before and after them, and derive an efficient score-following algorithm that reduces computational complexity without pruning. A theoretical discussion on how much such information on performer tendencies improves the score-following results is given. The proposed score-following algorithm also admits performance mistakes and is demonstrated to be effective in practical situations by carrying out evaluations with human performances. The proposed HMM is potentially valuable for other topics in information processing and we also provide a detailed description of inference algorithms.

Tomohiko Nakamura

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

ANIR : Atacama Near-Infrared Camera for the 1.0-m miniTAO Telescope

Ground-based Pa$α$ Narrow-band Imaging of Local Luminous Infrared Galaxies I: Star Formation Rates and Surface Densities

Outer-Product Hidden Markov Model and Polyphonic MIDI Score Following