Source author record

Alexander Jung

Alexander Jung appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT math.ST Statistics Theory Artificial Intelligence Computation Distributed, Parallel, and Cluster Computing math.OC math.PR Operating Systems

Catalog footprint

What is connected

21works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

$α$-TCAV: A Unified Framework for Testing with Concept Activation Vectors

Concept Activation Vectors (CAVs) are a fundamental tool for concept-based explainability in deep learning, yet their practical utility is limited by statistical instability. We analyze the stochastic nature of CAVs and the Testing with CAVs (TCAV) method, deriving the distributions of major CAV classes including PatternCAV, FastCAV, and ridge regression-based CAVs. We then identify a fundamental flaw in the standard TCAV score: its reliance on a discontinuous indicator function induces non-decaying variance in critical regimes. To address this, we introduce $α$-TCAV, a generalized framework that replaces the indicator with a parameterized smooth function, yielding a unified probabilistic formulation that subsumes both TCAV and Multi-TCAV. We characterize the induced distributions of sensitivity scores and different TCAV variants, showing that established state-of-the-art choices lack theoretical justification. We provide principled guidance on tuning the parameter in $α$-TCAV -- either to imitate Multi-TCAV at substantially lower computational cost, or to obtain a calibrated Bayes-optimal probabilistic measure of a concept's influence. Finally, our analysis yields practical recommendations that challenge established routines: most notably, allocating the full sampling budget to a single CAV rather than splitting it across several.

preprint2022arXiv

FlexOS: Towards Flexible OS Isolation

At design time, modern operating systems are locked in a specific safety and isolation strategy that mixes one or more hardware/software protection mechanisms (e.g. user/kernel separation); revisiting these choices after deployment requires a major refactoring effort. This rigid approach shows its limits given the wide variety of modern applications' safety/performance requirements, when new hardware isolation mechanisms are rolled out, or when existing ones break. We present FlexOS, a novel OS allowing users to easily specialize the safety and isolation strategy of an OS at compilation/deployment time instead of design time. This modular LibOS is composed of fine-grained components that can be isolated via a range of hardware protection mechanisms with various data sharing strategies and additional software hardening. The OS ships with an exploration technique helping the user navigate the vast safety/performance design space it unlocks. We implement a prototype of the system and demonstrate, for several applications (Redis/Nginx/SQLite), FlexOS' vast configuration space as well as the efficiency of the exploration technique: we evaluate 80 FlexOS configurations for Redis and show how that space can be probabilistically subset to the 5 safest ones under a given performance budget. We also show that, under equivalent configurations, FlexOS performs similarly or better than several baselines/competitors.

preprint2022arXiv

Machine Learning: The Basics

Machine learning (ML) has become a commodity in our every-day lives. We routinely ask ML empowered smartphones to suggest lovely food places or to guide us through a strange place. ML methods have also become standard tools in many fields of science and engineering. A plethora of ML applications transform human lives at unprecedented pace and scale. This book portrays ML as the combination of three basic components: data, model and loss. ML methods combine these three components within computationally efficient implementations of the basic scientific principle "trial and error". This principle consists of the continuous adaptation of a hypothesis about a phenomenon that generates data. ML methods use a hypothesis to compute predictions for future events. We believe that thinking about ML as combinations of three components given by data, model, and loss helps to navigate the steadily growing offer for ready-to-use ML methods. Our three-component picture of ML allows a unified treatment of a wide range of concepts and techniques which seem quite unrelated at first sight. The regularization effect of early stopping in iterative methods is due to the shrinking of the effective hypothesis space. Privacy-preserving ML is obtained by particular choices for the features of data points. Explainable ML methods are characterized by particular choices for the hypothesis space. To make good use of ML tools it is instrumental to understand its underlying principles at different levels of detail. On a lower level, this tutorial helps ML engineers to choose suitable methods for the application at hand. The book also offers a higher-level view on the implementation of ML methods which is typically required to manage a team of ML engineers and data scientists.

preprint2020arXiv

An Information-Theoretic Approach to Personalized Explainable Machine Learning

Automated decision making is used routinely throughout our everyday life. Recommender systems decide which jobs, movies, or other user profiles might be interesting to us. Spell checkers help us to make good use of language. Fraud detection systems decide if a credit card transactions should be verified more closely. Many of these decision making systems use machine learning methods that fit complex models to massive datasets. The successful deployment of machine learning (ML) methods to many (critical) application domains crucially depends on its explainability. Indeed, humans have a strong desire to get explanations that resolve the uncertainty about experienced phenomena like the predictions and decisions obtained from ML methods. Explainable ML is challenging since explanations must be tailored (personalized) to individual users with varying backgrounds. Some users might have received university-level education in ML, while other users might have no formal training in linear algebra. Linear regression with few features might be perfectly interpretable for the first group but might be considered a black-box by the latter. We propose a simple probabilistic model for the predictions and user knowledge. This model allows to study explainable ML using information theory. Explaining is here considered as the task of reducing the "surprise" incurred by a prediction. We quantify the effect of an explanation by the conditional mutual information between the explanation and prediction, given the user background.

preprint2020arXiv

Local Graph Clustering with Network Lasso

We study the statistical and computational properties of a network Lasso method for local graph clustering. The clusters delivered by nLasso can be characterized elegantly via network flows between cluster boundary and seed nodes. While spectral clustering methods are guided by a minimization of the graph Laplacian quadratic form, nLasso minimizes the total variation of cluster indicator signals. As demonstrated theoretically and numerically, nLasso methods can handle very sparse clusters (chain-like) which are difficult for spectral clustering. We also verify that a primal-dual method for nonsmooth optimization allows to approximate nLasso solutions with optimal worst-case convergence rate.

preprint2020arXiv

On the Duality between Network Flows and Network Lasso

Many applications generate data with an intrinsic network structure such as time series data, image data or social network data. The network Lasso (nLasso) has been proposed recently as a method for joint clustering and optimization of machine learning models for networked data. The nLasso extends the Lasso from sparse linear models to clustered graph signals. This paper explores the duality of nLasso and network flow optimization. We show that, in a very precise sense, nLasso is equivalent to a minimum-cost flow problem on the data network structure. Our main technical result is a concise characterization of nLasso solutions via existence of certain network flows. The main conceptual result is a useful link between nLasso methods and basic graph algorithms such as clustering or maximum flow.

preprint2019arXiv

Semi-supervised Learning in Network-Structured Data via Total Variation Minimization

We propose and analyze a method for semi-supervised learning from partially-labeled network-structured data. Our approach is based on a graph signal recovery interpretation under a clustering hypothesis that labels of data points belonging to the same well-connected subset (cluster) are similar valued. This lends naturally to learning the labels by total variation (TV) minimization, which we solve by applying a recently proposed primal-dual method for non-smooth convex optimization. The resulting algorithm allows for a highly scalable implementation using message passing over the underlying empirical graph, which renders the algorithm suitable for big data applications. By applying tools of compressed sensing, we derive a sufficient condition on the underlying network structure such that TV minimization recovers clusters in the empirical graph of the data. In particular, we show that the proposed primal-dual method amounts to maximizing network flows over the empirical graph of the dataset. Moreover, the learning accuracy of the proposed algorithm is linked to the set of network flows between data points having known labels. The effectiveness and scalability of our approach is verified by numerical experiments.

preprint2016arXiv

Learning conditional independence structure for high-dimensional uncorrelated vector processes

We formulate and analyze a graphical model selection method for inferring the conditional independence graph of a high-dimensional nonstationary Gaussian random process (time series) from a finite-length observation. The observed process samples are assumed uncorrelated over time and having a time-varying marginal distribution. The selection method is based on testing conditional variances obtained for small subsets of process components. This allows to cope with the high-dimensional regime, where the sample size can be (drastically) smaller than the process dimension. We characterize the required sample size such that the proposed selection method is successful with high probability.

preprint2016arXiv

Scalable Semi-Supervised Learning over Networks using Nonsmooth Convex Optimization

We propose a scalable method for semi-supervised (transductive) learning from massive network-structured datasets. Our approach to semi-supervised learning is based on representing the underlying hypothesis as a graph signal with small total variation. Requiring a small total variation of the graph signal representing the underlying hypothesis corresponds to the central smoothness assumption that forms the basis for semi-supervised learning, i.e., input points forming clusters have similar output values or labels. We formulate the learning problem as a nonsmooth convex optimization problem which we solve by appealing to Nesterovs optimal first-order method for nonsmooth optimization. We also provide a message passing formulation of the learning method which allows for a highly scalable implementation in big data frameworks.

preprint2015arXiv

Learning the Conditional Independence Structure of Stationary Time Series: A Multitask Learning Approach

We propose a method for inferring the conditional independence graph (CIG) of a high-dimensional Gaussian vector time series (discrete-time process) from a finite-length observation. By contrast to existing approaches, we do not rely on a parametric process model (such as, e.g., an autoregressive model) for the observed random process. Instead, we only require certain smoothness properties (in the Fourier domain) of the process. The proposed inference scheme works even for sample sizes much smaller than the number of scalar process components if the true underlying CIG is sufficiently sparse. A theoretical performance analysis provides conditions which guarantee that the probability of the proposed inference method to deliver a wrong CIG is below a prescribed value. These conditions imply lower bounds on the sample size such that the new method is consistent asymptotically. Some numerical experiments validate our theoretical performance analysis and demonstrate superior performance of our scheme compared to an existing (parametric) approach in case of model mismatch.

preprint2015arXiv

On the Minimax Risk of Dictionary Learning

We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived by following an established information-theoretic approach to minimax estimation. The main conceptual contribution of this paper is the adaption of the information-theoretic approach to minimax estimation for the DL problem in order to derive lower bounds on the worst case MSE of any DL scheme. We derive three different lower bounds applying to different generative models for the observed signals. The first bound applies to a wide range of models, it only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions, and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL schemes in terms of a signal to noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. While, compared with the previous two bounds, the applicability of this final bound is the most limited it is the tightest of the three bounds in the low SNR regime.

preprint2014arXiv

Compressive Nonparametric Graphical Model Selection For Time Series

We propose a method for inferring the conditional indepen- dence graph (CIG) of a high-dimensional discrete-time Gaus- sian vector random process from finite-length observations. Our approach does not rely on a parametric model (such as, e.g., an autoregressive model) for the vector random process; rather, it only assumes certain spectral smoothness proper- ties. The proposed inference scheme is compressive in that it works for sample sizes that are (much) smaller than the number of scalar process components. We provide analytical conditions for our method to correctly identify the CIG with high probability.

preprint2014arXiv

On the Information-theoretic Limits of Graphical Model Selection for Gaussian Time Series

We consider the problem of inferring the conditional independence graph (CIG) of a multivariate stationary dicrete-time Gaussian random process based on a finite length observation. Using information-theoretic methods, we derive a lower bound on the error probability of any learning scheme for the underlying process CIG. This bound, in turn, yields a minimum required sample-size which is necessary for any algorithm regardless of its computational complexity, to reliably select the true underlying CIG. Furthermore, by analysis of a simple selection scheme, we show that the information-theoretic limits can be achieved for a subclass of processes having sparse CIG. We do not assume a parametric model for the observed process, but require it to have a sufficiently smooth spectral density matrix (SDM).

preprint2014arXiv

Performance Limits of Dictionary Learning for Sparse Coding

We consider the problem of dictionary learning under the assumption that the observed signals can be represented as sparse linear combinations of the columns of a single large dictionary matrix. In particular, we analyze the minimax risk of the dictionary learning problem which governs the mean squared error (MSE) performance of any learning scheme, regardless of its computational complexity. By following an established information-theoretic method based on Fanos inequality, we derive a lower bound on the minimax risk for a given dictionary learning problem. This lower bound yields a characterization of the sample-complexity, i.e., a lower bound on the required number of observations such that consistent dictionary learning schemes exist. Our bounds may be compared with the performance of a given learning scheme, allowing to characterize how far the method is from optimal performance.

preprint2013arXiv

An RKHS Approach to Estimation with Sparsity Constraints

The investigation of the effects of sparsity or sparsity constraints in signal processing problems has received considerable attention recently. Sparsity constraints refer to the a priori information that the object or signal of interest can be represented by using only few elements of a predefined dictionary. Within this thesis, sparsity refers to the fact that a vector to be estimated has only few nonzero entries. One specific field concerned with sparsity constraints has become popular under the name Compressed Sensing (CS). Within CS, the sparsity is exploited in order to perform (nearly) lossless compression. Moreover, this compression is carried out jointly or simultaneously with the process of sensing a physical quantity. In contrast to CS, one can alternatively use sparsity to enhance signal processing methods. Obviously, sparsity constraints can only improve the obtainable estimation performance since the constraints can be interpreted as an additional prior information about the unknown parameter vector which is to be estimated. Our main focus will be on this aspect of sparsity, i.e., we analyze how much we can gain in estimation performance due to the sparsity constraints.

preprint2013arXiv

Compressive Spectral Estimation for Nonstationary Random Processes

Estimating the spectral characteristics of a nonstationary random process is an important but challenging task, which can be facilitated by exploiting structural properties of the process. In certain applications, the observed processes are underspread, i.e., their time and frequency correlations exhibit a reasonably fast decay, and approximately time-frequency sparse, i.e., a reasonably large percentage of the spectral values is small. For this class of processes, we propose a compressive estimator of the discrete Rihaczek spectrum (RS). This estimator combines a minimum variance unbiased estimator of the RS (which is a smoothed Rihaczek distribution using an appropriately designed smoothing kernel) with a compressed sensing technique that exploits the approximate time-frequency sparsity. As a result of the compression stage, the number of measurements required for good estimation performance can be significantly reduced. The measurements are values of the discrete ambiguity function of the observed signal at randomly chosen time and frequency lag positions. We provide bounds on the mean-square estimation error of both the minimum variance unbiased RS estimator and the compressive RS estimator, and we demonstrate the performance of the compressive estimator by means of simulation results. The proposed compressive RS estimator can also be used for estimating other time-dependent spectra (e.g., the Wigner-Ville spectrum) since for an underspread process most spectra are almost equal.

preprint2013arXiv

Minimum Variance Estimation of a Sparse Vector within the Linear Gaussian Model: An RKHS Approach

We consider minimum variance estimation within the sparse linear Gaussian model (SLGM). A sparse vector is to be estimated from a linearly transformed version embedded in Gaussian noise. Our analysis is based on the theory of reproducing kernel Hilbert spaces (RKHS). After a characterization of the RKHS associated with the SLGM, we derive novel lower bounds on the minimum variance achievable by estimators with a prescribed bias function. This includes the important case of unbiased estimation. The variance bounds are obtained via an orthogonal projection of the prescribed mean function onto a subspace of the RKHS associated with the SLGM. Furthermore, we specialize our bounds to compressed sensing measurement matrices and express them in terms of the restricted isometry and coherence parameters. For the special case of the SLGM given by the sparse signal in noise model (SSNM), we derive closed-form expressions of the minimum achievable variance (Barankin bound) and the corresponding locally minimum variance estimator. We also analyze the effects of exact and approximate sparsity information and show that the minimum achievable variance for exact sparsity is not a limiting case of that for approximate sparsity. Finally, we compare our bounds with the variance of three well-known estimators, namely, the maximum-likelihood estimator, the hard-thresholding estimator, and compressive reconstruction using the orthogonal matching pursuit.

preprint2013arXiv

The RKHS Approach to Minimum Variance Estimation Revisited: Variance Bounds, Sufficient Statistics, and Exponential Families

The mathematical theory of reproducing kernel Hilbert spaces (RKHS) provides powerful tools for minimum variance estimation (MVE) problems. Here, we extend the classical RKHS based analysis of MVE in several directions. We develop a geometric formulation of five known lower bounds on the estimator variance (Barankin bound, Cramer-Rao bound, constrained Cramer-Rao bound, Bhattacharyya bound, and Hammersley-Chapman-Robbins bound) in terms of orthogonal projections onto a subspace of the RKHS associated with a given MVE problem. We show that, under mild conditions, the Barankin bound (the tightest possible lower bound on the estimator variance) is a lower semicontinuous function of the parameter vector. We also show that the RKHS associated with an MVE problem remains unchanged if the observation is replaced by a sufficient statistic. Finally, for MVE problems conforming to an exponential family of distributions, we derive novel closed-form lower bound on the estimator variance and show that a reduction of the parameter set leaves the minimum achievable variance unchanged.

preprint2011arXiv

Performance Bounds for Sparse Parametric Covariance Estimation in Gaussian Models

We consider estimation of a sparse parameter vector that determines the covariance matrix of a Gaussian random vector via a sparse expansion into known "basis matrices". Using the theory of reproducing kernel Hilbert spaces, we derive lower bounds on the variance of estimators with a given mean function. This includes unbiased estimation as a special case. We also present a numerical comparison of our lower bounds with the variance of two standard estimators (hard-thresholding estimator and maximum likelihood estimator).

preprint2010arXiv

A Lower Bound on the Estimator Variance for the Sparse Linear Model

We study the performance of estimators of a sparse nonrandom vector based on an observation which is linearly transformed and corrupted by additive white Gaussian noise. Using the reproducing kernel Hilbert space framework, we derive a new lower bound on the estimator variance for a given differentiable bias function (including the unbiased case) and an almost arbitrary transformation matrix (including the underdetermined case considered in compressed sensing theory). For the special case of a sparse vector corrupted by white Gaussian noise-i.e., without a linear transformation-and unbiased estimation, our lower bound improves on previously proposed bounds.

preprint2010arXiv

On Unbiased Estimation of Sparse Vectors Corrupted by Gaussian Noise

We consider the estimation of a sparse parameter vector from measurements corrupted by white Gaussian noise. Our focus is on unbiased estimation as a setting under which the difficulty of the problem can be quantified analytically. We show that there are infinitely many unbiased estimators but none of them has uniformly minimum mean-squared error. We then provide lower and upper bounds on the Barankin bound, which describes the performance achievable by unbiased estimators. These bounds are used to predict the threshold region of practical estimators.

Alexander Jung

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

$α$-TCAV: A Unified Framework for Testing with Concept Activation Vectors

FlexOS: Towards Flexible OS Isolation

Machine Learning: The Basics

An Information-Theoretic Approach to Personalized Explainable Machine Learning

Local Graph Clustering with Network Lasso

On the Duality between Network Flows and Network Lasso

Semi-supervised Learning in Network-Structured Data via Total Variation Minimization

Learning conditional independence structure for high-dimensional uncorrelated vector processes

Scalable Semi-Supervised Learning over Networks using Nonsmooth Convex Optimization

Learning the Conditional Independence Structure of Stationary Time Series: A Multitask Learning Approach

On the Minimax Risk of Dictionary Learning

Compressive Nonparametric Graphical Model Selection For Time Series

On the Information-theoretic Limits of Graphical Model Selection for Gaussian Time Series

Performance Limits of Dictionary Learning for Sparse Coding

An RKHS Approach to Estimation with Sparsity Constraints

Compressive Spectral Estimation for Nonstationary Random Processes

Minimum Variance Estimation of a Sparse Vector within the Linear Gaussian Model: An RKHS Approach

The RKHS Approach to Minimum Variance Estimation Revisited: Variance Bounds, Sufficient Statistics, and Exponential Families

Performance Bounds for Sparse Parametric Covariance Estimation in Gaussian Models

A Lower Bound on the Estimator Variance for the Sparse Linear Model

On Unbiased Estimation of Sparse Vectors Corrupted by Gaussian Noise