Source author record

Gernot Kubin

Gernot Kubin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT nlin.SI Computation and Language eess.AS Sound

Catalog footprint

What is connected

10works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Using Kaldi for Automatic Speech Recognition of Conversational Austrian German

As dialogue systems are becoming more and more interactional and social, also the accurate automatic speech recognition (ASR) of conversational speech is of increasing importance. This shifts the focus from short, spontaneous, task-oriented dialogues to the much higher complexity of casual face-to-face conversations. However, the collection and annotation of such conversations is a time-consuming process and data is sparse for this specific speaking style. This paper presents ASR experiments with read and conversational Austrian German as target. In order to deal with having only limited resources available for conversational German and, at the same time, with a large variation among speakers with respect to pronunciation characteristics, we improve a Kaldi-based ASR system by incorporating a (large) knowledge-based pronunciation lexicon, while exploring different data-based methods to restrict the number of pronunciation variants for each lexical entry. We achieve best WER of 0.4% on Austrian German read speech and best average WER of 48.5% on conversational speech. We find that by using our best pronunciation lexicon a similarly high performance can be achieved than by increasing the size of the data used for the language model by approx. 360% to 760%. Our findings indicate that for low-resource scenarios -- despite the general trend in speech technology towards using data-based methods only -- knowledge-based approaches are a successful, efficient method.

preprint2014arXiv

Information Loss and Anti-Aliasing Filters in Multirate Systems

This work investigates the information loss in a decimation system, i.e., in a downsampler preceded by an anti-aliasing filter. It is shown that, without a specific signal model in mind, the anti-aliasing filter cannot reduce information loss, while, e.g., for a simple signal-plus-noise model it can. For the Gaussian case, the optimal anti-aliasing filter is shown to coincide with the one obtained from energetic considerations. For a non-Gaussian signal corrupted by Gaussian noise, the Gaussian assumption yields an upper bound on the information loss, justifying filter design principles based on second-order statistics from an information-theoretic point-of-view.

preprint2013arXiv

Information Measures for Deterministic Input-Output Systems

In this work the information loss in deterministic, memoryless systems is investigated by evaluating the conditional entropy of the input random variable given the output random variable. It is shown that for a large class of systems the information loss is finite, even if the input is continuously distributed. Based on this finiteness, the problem of perfectly reconstructing the input is addressed and Fano-type bounds between the information loss and the reconstruction error probability are derived. For systems with infinite information loss a relative measure is defined and shown to be tightly related to Rényi information dimension. Employing another Fano-type argument, the reconstruction error probability is bounded by the relative information loss from below. In view of developing a system theory from an information-theoretic point-of-view, the theoretical results are illustrated by a few example systems, among them a multi-channel autocorrelation receiver.

preprint2013arXiv

On the Rate of Information Loss in Memoryless Systems

In this work we present results about the rate of (relative) information loss induced by passing a real-valued, stationary stochastic process through a memoryless system. We show that for a special class of systems the information loss rate is closely related to the difference of differential entropy rates of the input and output processes. It is further shown that the rate of (relative) information loss is bounded from above by the (relative) information loss the system induces on a random variable distributed according to the process's marginal distribution. As a side result, in this work we present sufficient conditions such that for a continuous-valued Markovian input process also the output process possesses the Markov property.

preprint2013arXiv

Signal Enhancement as Minimization of Relevant Information Loss

We introduce the notion of relevant information loss for the purpose of casting the signal enhancement problem in information-theoretic terms. We show that many algorithms from machine learning can be reformulated using relevant information loss, which allows their application to the aforementioned problem. As a particular example we analyze principle component analysis for dimensionality reduction, discuss its optimality, and show that the relevant information loss can indeed vanish if the relevant information is concentrated on a lower-dimensional subspace of the input space.

preprint2012arXiv

Relative Information Loss - An Introduction

We introduce a relative variant of information loss to characterize the behavior of deterministic input-output systems. We show that the relative loss is closely related to Renyi's information dimension. We provide an upper bound for continuous input random variables and an exact result for a class of functions (comprising quantizers) with infinite absolute information loss. A connection between relative information loss and reconstruction error is investigated.

preprint2012arXiv

Relative Information Loss in the PCA

In this work we analyze principle component analysis (PCA) as a deterministic input-output system. We show that the relative information loss induced by reducing the dimensionality of the data after performing the PCA is the same as in dimensionality reduction without PCA. Finally, we analyze the case where the PCA uses the sample covariance matrix to compute the rotation. If the rotation matrix is not available at the output, we show that an infinite amount of information is lost. The relative information loss is shown to decrease with increasing sample size.

preprint2011arXiv

Information Loss in Static Nonlinearities

In this work, conditional entropy is used to quantify the information loss induced by passing a continuous random variable through a memoryless nonlinear input-output system. We derive an expression for the information loss depending on the input density and the nonlinearity and show that the result is strongly related to the non-injectivity of the considered system. Tight upper bounds are presented, which can be evaluated with less difficulty than a direct evaluation of the information loss, which involves the logarithm of a sum. Application of our results is illustrated on a set of examples.

preprint2011arXiv

On the Information Loss in Memoryless Systems: The Multivariate Case

In this work we give a concise definition of information loss from a system-theoretic point of view. Based on this definition, we analyze the information loss in static input-output systems subject to a continuous-valued input. For a certain class of multiple-input, multiple-output systems the information loss is quantified. An interpretation of this loss is accompanied by upper bounds which are simple to evaluate. Finally, a class of systems is identified for which the information loss is necessarily infinite. Quantizers and limiters are shown to belong to this class.

preprint2011arXiv

Some Results on the Information Loss in Dynamical Systems

In this work we investigate the information loss in (nonlinear) dynamical input-output systems and provide some general results. In particular, we present an upper bound on the information loss rate, defined as the (non-negative) difference between the entropy rates of the jointly stationary stochastic processes at the input and output of the system. We further introduce a family of systems with vanishing information loss rate. It is shown that not only linear filters belong to that family, but - under certain circumstances - also finite-precision implementations of the latter, which typically consist of nonlinear elements.

Gernot Kubin

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Using Kaldi for Automatic Speech Recognition of Conversational Austrian German

Information Loss and Anti-Aliasing Filters in Multirate Systems

Information Measures for Deterministic Input-Output Systems

On the Rate of Information Loss in Memoryless Systems

Signal Enhancement as Minimization of Relevant Information Loss

Relative Information Loss - An Introduction

Relative Information Loss in the PCA

Information Loss in Static Nonlinearities

On the Information Loss in Memoryless Systems: The Multivariate Case

Some Results on the Information Loss in Dynamical Systems