Source author record

Daniel Aalto

Daniel Aalto appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Sound math.FA math.MG Biological Physics Computation and Language math.DS physics.class-ph physics.med-ph Tissues and Organs

Catalog footprint

What is connected

7works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

Hierarchical Representation of Prosody for Statistical Speech Synthesis

Prominences and boundaries are the essential constituents of prosodic structure in speech. They provide for means to chunk the speech stream into linguistically relevant units by providing them with relative saliences and demarcating them within coherent utterance structures. Prominences and boundaries have both been widely used in both basic research on prosody as well as in text-to-speech synthesis. However, there are no representation schemes that would provide for both estimating and modelling them in a unified fashion. Here we present an unsupervised unified account for estimating and representing prosodic prominences and boundaries using a scale-space analysis based on continuous wavelet transform. The methods are evaluated and compared to earlier work using the Boston University Radio News corpus. The results show that the proposed method is comparable with the best published supervised annotation methods.

preprint2015arXiv

Modal locking between vocal fold and vocal tract oscillations: Experiments and statistical analysis

The human vocal folds are known to interact with the vocal tract acoustics during voiced speech production; namely a nonlinear source-filter coupling has been observed both by using models and in \emph{in vivo} phonation. These phenomena are approached from two directions in this article. We first present a computational dynamical model of the speech apparatus that contains an explicit filter-source feedback mechanism from the vocal tract acoustics back to the vocal folds oscillations. The model was used to simulate vocal pitch glideswhere the trajectory was forced to cross the lowest vocal tract resonance, i.e., the lowest formant $F_1$. Similar patterns produced by human participants were then studied. Both the simulations and the experimental results reveal an effect when the glides cross the first formant (as may happen in \textipa{[i]}). Conversely, this effect is not observed if there is no formant within the glide range (as is the case in \textipa{[\textscripta]}). The experiments show smaller effect compared to the simulations, pointing to an active compensation mechanism.

preprint2015arXiv

Spectral Study of the Vocal Tract in Vowel Synthesis: A Comparison between 1D and 3D Acoustic Analysis

A state-of-the-art 1D acoustic synthesizer has been previously developed, and coupled to speaker-specific biomechanical models of oropharynx in ArtiSynth. As expected, the formant frequencies of the synthesized vowel sounds were shown to be different from those of the recorded audio. Such discrepancy was hypothesized to be due to the simplified geometry of the vocal tract model as well as the one dimensional implementation of Navier-Stokes equations. In this paper, we calculate Helmholtz resonances of our vocal tract geometries using 3D finite element method (FEM), and compare them with the formant frequencies obtained from the 1D method and audio. We hope such comparison helps with clarifying the limitations of our current models and/or speech synthesizer.

preprint2013arXiv

Measurement of acoustic and anatomic changes in oral and maxillofacial surgery patients

We describe an arrangement for simultaneous recording of speech and geometry of vocal tract in patients undergoing surgery involving this area. Experimental design is considered from an articulatory phonetic point of view. The speech and noise signals are recorded with an acoustic-electrical arrangement. The vocal tract is simultaneously imaged with MRI. A MATLAB-based system controls the timing of speech recording and MR image acquisition. The speech signals are cleaned from acoustic MRI noise by a non-linear signal processing algorithm. Finally, a vowel data set from pilot experiments is compared with validation data from anechoic chamber as well as with Helmholtz resonances of the vocal tract volume.

preprint2012arXiv

How far are vowel formants from computed vocal tract resonances?

We compare numerically computed resonances of the human vocal tract with formants that have been extracted from speech during vowel pronunciation. The geometry of the vocal tract has been obtained by MRI from a male subject, and the corresponding speech has been recorded simultaneously. The resonances are computed by solving the Helmholtz partial differential equation with the Finite Element Method (FEM). Despite a rudimentary exterior space acoustics model, i.e., the Dirichlet boundary condition at the mouth opening, the computed resonance structure differs from the measured formant structure by $\approx$ 0.7 semitones for [i] and [u] having small mouth opening area, and by $\approx$ 3 semitones for vowels [a] and [ae] that have a larger mouth opening. The contribution of the possibly open velar port has not been taken into considaration at all which adds the discrepancy for [a] in the present data set. We conclude that by improving the exterior space model and properly treating the velar port opening, it is possible to computationally attain four lowest vowel formants with an error less than a semitone. The corresponding wave equation model on MRI-produced vocal tract geometries is expected to have a comparable accuracy.

preprint2010arXiv

John-Nirenberg lemmas for a doubling measure

We study, in the context of doubling metric measure spaces, a class of BMO type functions defined by John and Nirenberg. In particular, we present a new version of the Calderon-Zygmund decomposition in metric spaces and use it to prove the corresponding John-Nirenberg inequality.

preprint2010arXiv

Weak $L^{\infty}$ and BMO in metric spaces

Bennett, DeVore and Sharpley introduced the space weak $L^{\infty}$ in 1981 and studied its relationship with functions of bounded mean oscillation. Here we characterize weak $L^{\infty}$ in measure spaces without using the decreasing rearrangement of a function. Instead, we obtain exponential estimates for the distribution function. In addition, we consider a localized version of the characterization that leads to a new characterization of BMO.

Daniel Aalto

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Hierarchical Representation of Prosody for Statistical Speech Synthesis

Modal locking between vocal fold and vocal tract oscillations: Experiments and statistical analysis

Spectral Study of the Vocal Tract in Vowel Synthesis: A Comparison between 1D and 3D Acoustic Analysis

Measurement of acoustic and anatomic changes in oral and maxillofacial surgery patients

How far are vowel formants from computed vocal tract resonances?

John-Nirenberg lemmas for a doubling measure

Weak $L^{\infty}$ and BMO in metric spaces