Source author record

Martin Kleinsteuber

Martin Kleinsteuber appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Information Theory math.IT Information Retrieval Artificial Intelligence math.DG Sound

Catalog footprint

What is connected

22works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning

Humans are able to identify and categorize novel compositions of known concepts. The task in Compositional Zero-Shot learning (CZSL) is to learn composition of primitive concepts, i.e. objects and states, in such a way that even their novel compositions can be zero-shot classified. In this work, we do not assume any prior knowledge on the feasibility of novel compositions i.e.open-world setting, where infeasible compositions dominate the search space. We propose a Compositional Variational Graph Autoencoder (CVGAE) approach for learning the variational embeddings of the primitive concepts (nodes) as well as feasibility of their compositions (via edges). Such modelling makes CVGAE scalable to real-world application scenarios. This is in contrast to SOTA method, CGE, which is computationally very expensive. e.g.for benchmark C-GQA dataset, CGE requires 3.94 x 10^5 nodes, whereas CVGAE requires only 1323 nodes. We learn a mapping of the graph and image embeddings onto a common embedding space. CVGAE adopts a deep metric learning approach and learns a similarity metric in this space via bi-directional contrastive loss between projected graph and image embeddings. We validate the effectiveness of our approach on three benchmark datasets.We also demonstrate via an image retrieval task that the representations learnt by CVGAE are better suited for compositional generalization.

preprint2021arXiv

Variational Embeddings for Community Detection and Node Representation

In this paper, we study how to simultaneously learn two highly correlated tasks of graph analysis, i.e., community detection and node representation learning. We propose an efficient generative model called VECoDeR for jointly learning Variational Embeddings for Community Detection and node Representation. VECoDeR assumes that every node can be a member of one or more communities. The node embeddings are learned in such a way that connected nodes are not only "closer" to each other but also share similar community assignments. A joint learning framework leverages community-aware node embeddings for better community detection. We demonstrate on several graph datasets that VECoDeR effectively out-performs many competitive baselines on all three tasks i.e. node classification, overlapping community detection and non-overlapping community detection. We also show that VECoDeR is computationally efficient and has quite robust performance with varying hyperparameters.

preprint2020arXiv

Epitomic Variational Graph Autoencoder

Variational autoencoder (VAE) is a widely used generative model for learning latent representations. Burda et al. in their seminal paper showed that learning capacity of VAE is limited by over-pruning. It is a phenomenon where a significant number of latent variables fail to capture any information about the input data and the corresponding hidden units become inactive. This adversely affects learning diverse and interpretable latent representations. As variational graph autoencoder (VGAE) extends VAE for graph-structured data, it inherits the over-pruning problem. In this paper, we adopt a model based approach and propose epitomic VGAE (EVGAE),a generative variational framework for graph datasets which successfully mitigates the over-pruning problem and also boosts the generative ability of VGAE. We consider EVGAE to consist of multiple sparse VGAE models, called epitomes, that are groups of latent variables sharing the latent space. This approach aids in increasing active units as epitomes compete to learn better representation of the graph data. We verify our claims via experiments on three benchmark datasets. Our experiments show that EVGAE has a better generative ability than VGAE. Moreover, EVGAE outperforms VGAE on link prediction task in citation networks.

preprint2020arXiv

Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products

Improved search quality enhances users' satisfaction, which directly impacts sales growth of an E-Commerce (E-Com) platform. Traditional Learning to Rank (LTR) algorithms require relevance judgments on products. In E-Com, getting such judgments poses an immense challenge. In the literature, it is proposed to employ user feedback (such as clicks, add-to-basket (AtB) clicks and orders) to generate relevance judgments. It is done in two steps: first, query-product pair data are aggregated from the logs and then order rate etc are calculated for each pair in the logs. In this paper, we advocate counterfactual risk minimization (CRM) approach which circumvents the need of relevance judgements, data aggregation and is better suited for learning from logged data, i.e. contextual bandit feedback. Due to unavailability of public E-Com LTR dataset, we provide \textit{Mercateo dataset} from our platform. It contains more than 10 million AtB click logs and 1 million order logs from a catalogue of about 3.5 million products associated with 3060 queries. To the best of our knowledge, this is the first work which examines effectiveness of CRM approach in learning ranking model from real-world logged data. Our empirical evaluation shows that our CRM approach learns effectively from logged data and beats a strong baseline ranker ($λ$-MART) by a huge margin. Our method outperforms full-information loss (e.g. cross-entropy) on various deep neural network models. These findings demonstrate that by adopting CRM approach, E-Com platforms can get better product search quality compared to full-information approach. The code and dataset can be accessed at: https://github.com/ecom-research/CRM-LTR.

preprint2015arXiv

Learning Co-Sparse Analysis Operators with Separable Structures

In the co-sparse analysis model a set of filters is applied to a signal out of the signal class of interest yielding sparse filter responses. As such, it may serve as a prior in inverse problems, or for structural analysis of signals that are known to belong to the signal class. The more the model is adapted to the class, the more reliable it is for these purposes. The task of learning such operators for a given class is therefore a crucial problem. In many applications, it is also required that the filter responses are obtained in a timely manner, which can be achieved by filters with a separable structure. Not only can operators of this sort be efficiently used for computing the filter responses, but they also have the advantage that less training samples are required to obtain a reliable estimate of the operator. The first contribution of this work is to give theoretical evidence for this claim by providing an upper bound for the sample complexity of the learning process. The second is a stochastic gradient descent (SGD) method designed to learn an analysis operator with separable structures, which includes a novel and efficient step size selection rule. Numerical experiments are provided that link the sample complexity to the convergence speed of the SGD algorithm.

preprint2015arXiv

Robust Structured Low-Rank Approximation on the Grassmannian

Over the past years Robust PCA has been established as a standard tool for reliable low-rank approximation of matrices in the presence of outliers. Recently, the Robust PCA approach via nuclear norm minimization has been extended to matrices with linear structures which appear in applications such as system identification and data series analysis. At the same time it has been shown how to control the rank of a structured approximation via matrix factorization approaches. The drawbacks of these methods either lie in the lack of robustness against outliers or in their static nature of repeated batch-processing. We present a Robust Structured Low-Rank Approximation method on the Grassmannian that on the one hand allows for fast re-initialization in an online setting due to subspace identification with manifolds, and that is robust against outliers due to a smooth approximation of the $\ell_p$-norm cost function on the other hand. The method is evaluated in online time series forecasting tasks on simulated and real-world data.

preprint2015arXiv

Sample Complexity of Dictionary Learning and other Matrix Factorizations

Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection. While the idealized task would be to optimize the expected quality of the factors over the underlying distribution of training vectors, it is achieved in practice by minimizing an empirical average over the considered collection. The focus of this paper is to provide sample complexity estimates to uniformly control how much the empirical average deviates from the expected cost function. Standard arguments imply that the performance of the empirical predictor also exhibit such guarantees. The level of genericity of the approach encompasses several possible constraints on the factors (tensor product structure, shift-invariance, sparsity \ldots), thus providing a unified perspective on the sample complexity of several widely used matrix factorization schemes. The derived generalization bounds behave proportional to $\sqrt{\log(n)/n}$ w.r.t.\ the number of samples $n$ for the considered matrix factorization techniques.

preprint2014arXiv

A Bimodal Co-Sparse Analysis Model for Image Processing

The success of many computer vision tasks lies in the ability to exploit the interdependency between different image modalities such as intensity and depth. Fusing corresponding information can be achieved on several levels, and one promising approach is the integration at a low level. Moreover, sparse signal models have successfully been used in many vision applications. Within this area of research, the so called co-sparse analysis model has attracted considerably less attention than its well-known counterpart, the sparse synthesis model, although it has been proven to be very useful in various image processing applications. In this paper, we propose a co-sparse analysis model that is able to capture the interdependency of two image modalities. It is based on the assumption that a pair of analysis operators exists, so that the co-supports of the corresponding bimodal image structures are correlated. We propose an algorithm that is able to learn such a coupled pair of operators from registered and noise-free training data. Furthermore, we explain how this model can be applied to solve linear inverse problems in image processing and how it can be used for image registration tasks. This paper extends the work of some of the authors by two major contributions. Firstly, a modification of the learning process is proposed that a priori guarantees unit norm and zero-mean of the rows of the operator. This accounts for the intuition that contrast in image modalities carries the most information. Secondly, the model is used in a novel bimodal image registration algorithm which estimates the transformation parameters of unregistered images of different modalities.

preprint2014arXiv

On The Sample Complexity of Sparse Dictionary Learning

In the synthesis model signals are represented as a sparse combinations of atoms from a dictionary. Dictionary learning describes the acquisition process of the underlying dictionary for a given set of training samples. While ideally this would be achieved by optimizing the expectation of the factors over the underlying distribution of the training data, in practice the necessary information about the distribution is not available. Therefore, in real world applications it is achieved by minimizing an empirical average over the available samples. The main goal of this paper is to provide a sample complexity estimate that controls to what extent the empirical average deviates from the cost function. This estimate then provides a suitable estimate to the accuracy of the representation of the learned dictionary. The presented approach exemplifies the general results proposed by the authors in Sample Complexity of Dictionary Learning and other Matrix Factorizations, Gribonval et al. and gives more concrete bounds of the sample complexity of dictionary learning. We cover a variety of sparsity measures employed in the learning procedure.

preprint2014arXiv

Separable Cosparse Analysis Operator Learning

The ability of having a sparse representation for a certain class of signals has many applications in data analysis, image processing, and other research fields. Among sparse representations, the cosparse analysis model has recently gained increasing interest. Many signals exhibit a multidimensional structure, e.g. images or three-dimensional MRI scans. Most data analysis and learning algorithms use vectorized signals and thereby do not account for this underlying structure. The drawback of not taking the inherent structure into account is a dramatic increase in computational cost. We propose an algorithm for learning a cosparse Analysis Operator that adheres to the preexisting structure of the data, and thus allows for a very efficient implementation. This is achieved by enforcing a separable structure on the learned operator. Our learning algorithm is able to deal with multidimensional data of arbitrary order. We evaluate our method on volumetric data at the example of three-dimensional MRI scans.

preprint2014arXiv

Sparse DOA Estimation of Wideband Sound Sources Using Circular Harmonics

Sparse signal models are in the focus of recent developments in narrowband DOA estimation. Applying these methods to localizing audio sources, however, is challenging due to the wideband nature of the signals. The common approach of processing all frequency bands separately and fusing the results is costly and can introduce errors in the solution. We show how these problems can be overcome by decomposing the wavefield of a circular microphone array and using circular harmonic coefficients instead of time-frequency data for sparse DOA estimation. As a result, we present the super-resolution localization method WASCHL (Wideband Audio Sparse Circular Harmonics Localizer) that is inherently frequency-coherent and highly efficient from a computational point of view.

preprint2013arXiv

A Joint Intensity and Depth Co-Sparse Analysis Model for Depth Map Super-Resolution

High-resolution depth maps can be inferred from low-resolution depth measurements and an additional high-resolution intensity image of the same scene. To that end, we introduce a bimodal co-sparse analysis model, which is able to capture the interdependency of registered intensity and depth information. This model is based on the assumption that the co-supports of corresponding bimodal image structures are aligned when computed by a suitable pair of analysis operators. No analytic form of such operators exist and we propose a method for learning them from a set of registered training signals. This learning process is done offline and returns a bimodal analysis operator that is universally applicable to natural scenes. We use this to exploit the bimodal co-sparse analysis model as a prior for solving inverse problems, which leads to an efficient algorithm for depth map super-resolution.

preprint2013arXiv

An Adaptive Dictionary Learning Approach for Modeling Dynamical Textures

Video representation is an important and challenging task in the computer vision community. In this paper, we assume that image frames of a moving scene can be modeled as a Linear Dynamical System. We propose a sparse coding framework, named adaptive video dictionary learning (AVDL), to model a video adaptively. The developed framework is able to capture the dynamics of a moving scene by exploring both sparse properties and the temporal correlations of consecutive video frames. The proposed method is compared with state of the art video processing methods on several benchmark data sequences, which exhibit appearance changes and heavy occlusions.

preprint2013arXiv

Analysis Based Blind Compressive Sensing

In this work we address the problem of blindly reconstructing compressively sensed signals by exploiting the co-sparse analysis model. In the analysis model it is assumed that a signal multiplied by an analysis operator results in a sparse vector. We propose an algorithm that learns the operator adaptively during the reconstruction process. The arising optimization problem is tackled via a geometric conjugate gradient approach. Different types of sampling noise are handled by simply exchanging the data fidelity term. Numerical experiments are performed for measurements corrupted with Gaussian as well as impulsive noise to show the effectiveness of our method.

preprint2013arXiv

Analysis Operator Learning and Its Application to Image Reconstruction

Exploiting a priori known structural information lies at the core of many image reconstruction methods that can be stated as inverse problems. The synthesis model, which assumes that images can be decomposed into a linear combination of very few atoms of some dictionary, is now a well established tool for the design of image reconstruction algorithms. An interesting alternative is the analysis model, where the signal is multiplied by an analysis operator and the outcome is assumed to be the sparse. This approach has only recently gained increasing interest. The quality of reconstruction methods based on an analysis model severely depends on the right choice of the suitable operator. In this work, we present an algorithm for learning an analysis operator from training images. Our method is based on an $\ell_p$-norm minimization on the set of full rank matrices with normalized columns. We carefully introduce the employed conjugate gradient method on manifolds, and explain the underlying geometry of the constraints. Moreover, we compare our approach to state-of-the-art methods for image denoising, inpainting, and single image super-resolution. Our numerical results show competitive performance of our general approach in all presented applications compared to the specialized state-of-the-art techniques.

preprint2013arXiv

Co-Sparse Textural Similarity for Image Segmentation

We propose an algorithm for segmenting natural images based on texture and color information, which leverages the co-sparse analysis model for image segmentation within a convex multilabel optimization framework. As a key ingredient of this method, we introduce a novel textural similarity measure, which builds upon the co-sparse representation of image patches. We propose a Bayesian approach to merge textural similarity with information about color and location. Combined with recently developed convex multilabel optimization methods this leads to an efficient algorithm for both supervised and unsupervised segmentation, which is easily parallelized on graphics hardware. The approach provides competitive results in unsupervised segmentation and outperforms state-of-the-art interactive segmentation methods on the Graz Benchmark.

preprint2013arXiv

pROST : A Smoothed Lp-norm Robust Online Subspace Tracking Method for Realtime Background Subtraction in Video

An increasing number of methods for background subtraction use Robust PCA to identify sparse foreground objects. While many algorithms use the L1-norm as a convex relaxation of the ideal sparsifying function, we approach the problem with a smoothed Lp-norm and present pROST, a method for robust online subspace tracking. The algorithm is based on alternating minimization on manifolds. Implemented on a graphics processing unit it achieves realtime performance. Experimental results on a state-of-the-art benchmark for background subtraction on real-world video data indicate that the method succeeds at a broad variety of background subtraction scenarios, and it outperforms competing approaches when video quality is deteriorated by camera jitter.

preprint2013arXiv

Robust PCA and subspace tracking from incomplete observations using L0-surrogates

Many applications in data analysis rely on the decomposition of a data matrix into a low-rank and a sparse component. Existing methods that tackle this task use the nuclear norm and L1-cost functions as convex relaxations of the rank constraint and the sparsity measure, respectively, or employ thresholding techniques. We propose a method that allows for reconstructing and tracking a subspace of upper-bounded dimension from incomplete and corrupted observations. It does not require any a priori information about the number of outliers. The core of our algorithm is an intrinsic Conjugate Gradient method on the set of orthogonal projection matrices, the so-called Grassmannian. Non-convex sparsity measures are used for outlier detection, which leads to improved performance in terms of robustly recovering and tracking the low-rank matrix. In particular, our approach can cope with more outliers and with an underlying matrix of higher rank than other state-of-the-art methods.

preprint2013arXiv

Separable Dictionary Learning

Many techniques in computer vision, machine learning, and statistics rely on the fact that a signal of interest admits a sparse representation over some dictionary. Dictionaries are either available analytically, or can be learned from a suitable training set. While analytic dictionaries permit to capture the global structure of a signal and allow a fast implementation, learned dictionaries often perform better in applications as they are more adapted to the considered class of signals. In imagery, unfortunately, the numerical burden for (i) learning a dictionary and for (ii) employing the dictionary for reconstruction tasks only allows to deal with relatively small image patches that only capture local image information. The approach presented in this paper aims at overcoming these drawbacks by allowing a separable structure on the dictionary throughout the learning process. On the one hand, this permits larger patch-sizes for the learning phase, on the other hand, the dictionary is applied efficiently in reconstruction tasks. The learning procedure is based on optimizing over a product of spheres which updates the dictionary as a whole, thus enforces basic dictionary properties such as mutual coherence explicitly during the learning procedure. In the special case where no separable structure is enforced, our method competes with state-of-the-art dictionary learning methods like K-SVD.

preprint2012arXiv

Averaging Complex Subspaces via a Karcher Mean Approach

We propose a conjugate gradient type optimization technique for the computation of the Karcher mean on the set of complex linear subspaces of fixed dimension, modeled by the so-called Grassmannian. The identification of the Grassmannian with Hermitian projection matrices allows an accessible introduction of the geometric concepts required for an intrinsic conjugate gradient method. In particular, proper definitions of geodesics, parallel transport, and the Riemannian gradient of the Karcher mean function are presented. We provide an efficient step-size selection for the special case of one dimensional complex subspaces and illustrate how the method can be employed for blind identification via numerical experiments.

preprint2012arXiv

Uniqueness Analysis of Non-Unitary Matrix Joint Diagonalization

Matrix Joint Diagonalization (MJD) is a powerful approach for solving the Blind Source Separation (BSS) problem. It relies on the construction of matrices which are diagonalized by the unknown demixing matrix. Their joint diagonalizer serves as a correct estimate of this demixing matrix only if it is uniquely determined. Thus, a critical question is under what conditions a joint diagonalizer is unique. In the present work we fully answer this question about the identifiability of MJD based BSS approaches and provide a general result on uniqueness conditions of matrix joint diagonalization. It unifies all existing results which exploit the concepts of non-circularity, non-stationarity, non-whiteness, and non-Gaussianity. As a corollary, we propose a solution for complex BSS, which can be formulated in a closed form in terms of an eigenvalue and a singular value decomposition of two matrices.

preprint2011arXiv

Blind Source Separation with Compressively Sensed Linear Mixtures

This work studies the problem of simultaneously separating and reconstructing signals from compressively sensed linear mixtures. We assume that all source signals share a common sparse representation basis. The approach combines classical Compressive Sensing (CS) theory with a linear mixing model. It allows the mixtures to be sampled independently of each other. If samples are acquired in the time domain, this means that the sensors need not be synchronized. Since Blind Source Separation (BSS) from a linear mixture is only possible up to permutation and scaling, factoring out these ambiguities leads to a minimization problem on the so-called oblique manifold. We develop a geometric conjugate subgradient method that scales to large systems for solving the problem. Numerical results demonstrate the promising performance of the proposed algorithm compared to several state of the art methods.

Martin Kleinsteuber

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning

Variational Embeddings for Community Detection and Node Representation

Epitomic Variational Graph Autoencoder

Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products

Learning Co-Sparse Analysis Operators with Separable Structures

Robust Structured Low-Rank Approximation on the Grassmannian

Sample Complexity of Dictionary Learning and other Matrix Factorizations

A Bimodal Co-Sparse Analysis Model for Image Processing

On The Sample Complexity of Sparse Dictionary Learning

Separable Cosparse Analysis Operator Learning

Sparse DOA Estimation of Wideband Sound Sources Using Circular Harmonics

A Joint Intensity and Depth Co-Sparse Analysis Model for Depth Map Super-Resolution

An Adaptive Dictionary Learning Approach for Modeling Dynamical Textures

Analysis Based Blind Compressive Sensing

Analysis Operator Learning and Its Application to Image Reconstruction

Co-Sparse Textural Similarity for Image Segmentation

pROST : A Smoothed Lp-norm Robust Online Subspace Tracking Method for Realtime Background Subtraction in Video

Robust PCA and subspace tracking from incomplete observations using L0-surrogates

Separable Dictionary Learning

Averaging Complex Subspaces via a Karcher Mean Approach

Uniqueness Analysis of Non-Unitary Matrix Joint Diagonalization

Blind Source Separation with Compressively Sensed Linear Mixtures