Researcher profile

Ryo Nakashima

Ryo Nakashima contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
5topics
3close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2019arXiv

Unsupervised Phoneme and Word Discovery from Multiple Speakers using Double Articulation Analyzer and Neural Network with Parametric Bias

This paper describes a new unsupervised machine learning method for simultaneous phoneme and word discovery from multiple speakers. Human infants can acquire knowledge of phonemes and words from interactions with his/her mother as well as with others surrounding him/her. From a computational perspective, phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker because the speech signals from different speakers exhibit different acoustic features. This paper proposes an unsupervised phoneme and word discovery method that simultaneously uses nonparametric Bayesian double articulation analyzer (NPB-DAA) and deep sparse autoencoder with parametric bias in hidden layer (DSAE-PBHL). We assume that an infant can recognize and distinguish speakers based on certain other features, e.g., visual face recognition. DSAE-PBHL is aimed to be able to subtract speaker-dependent acoustic features and extract speaker-independent features. An experiment demonstrated that DSAE-PBHL can subtract distributed representations of acoustic signals, enabling extraction based on the types of phonemes rather than on the speakers. Another experiment demonstrated that a combination of NPB-DAA and DSAE-PB outperformed the available methods in phoneme and word discovery tasks involving speech signals with Japanese vowel sequences from multiple speakers.

preprint2016arXiv

Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals

Human infants can discover words directly from unsegmented speech signals without any explicitly labeled data. In this paper, we develop a novel machine learning method called nonparametric Bayesian double articulation analyzer (NPB-DAA) that can directly acquire language and acoustic models from observed continuous speech signals. For this purpose, we propose an integrative generative model that combines a language model and an acoustic model into a single generative model called the "hierarchical Dirichlet process hidden language model" (HDP-HLM). The HDP-HLM is obtained by extending the hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by Johnson et al. An inference procedure for the HDP-HLM is derived using the blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure enables the simultaneous and direct inference of language and acoustic models from continuous speech signals. Based on the HDP-HLM and its inference procedure, we developed a novel double articulation analyzer. By assuming HDP-HLM as a generative model of observed time series data, and by inferring latent variables of the model, the method can analyze latent double articulation structure, i.e., hierarchically organized latent words and phonemes, of the data in an unsupervised manner. The novel unsupervised double articulation analyzer is called NPB-DAA. The NPB-DAA can automatically estimate double articulation structure embedded in speech signals. We also carried out two evaluation experiments using synthetic data and actual human continuous speech signals representing Japanese vowel sequences. In the word acquisition and phoneme categorization tasks, the NPB-DAA outperformed a conventional double articulation analyzer (DAA) and baseline automatic speech recognition system whose acoustic model was trained in a supervised manner.