Source author record

Matthias Hein

Matthias Hein appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision math.OC Cryptography and Security Data Structures and Algorithms math.NA math.SP Numerical Analysis Discrete Mathematics math.FA math.ST Statistics Theory Computational Complexity eess.IV Hardware Architecture math.PR Methodology Social and Information Networks

Catalog footprint

What is connected

48works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Adversarial Robustness against Multiple and Single $l_p$-Threat Models via Quick Fine-Tuning of Robust Classifiers

A major drawback of adversarially robust models, in particular for large scale datasets like ImageNet, is the extremely long training time compared to standard ones. Moreover, models should be robust not only to one $l_p$-threat model but ideally to all of them. In this paper we propose Extreme norm Adversarial Training (E-AT) for multiple-norm robustness which is based on geometric properties of $l_p$-balls. E-AT costs up to three times less than other adversarial training methods for multiple-norm robustness. Using E-AT we show that for ImageNet a single epoch and for CIFAR-10 three epochs are sufficient to turn any $l_p$-robust model into a multiple-norm robust model. In this way we get the first multiple-norm robust model for ImageNet and boost the state-of-the-art for multiple-norm robustness to more than $51\%$ on CIFAR-10. Finally, we study the general transfer via fine-tuning of adversarial robustness between different individual $l_p$-threat models and improve the previous SOTA $l_1$-robustness on both CIFAR-10 and ImageNet. Extensive experiments show that our scheme works across datasets and architectures including vision transformers.

preprint2022arXiv

Adversarial Robustness of MR Image Reconstruction under Realistic Perturbations

Deep Learning (DL) methods have shown promising results for solving ill-posed inverse problems such as MR image reconstruction from undersampled $k$-space data. However, these approaches currently have no guarantees for reconstruction quality and the reliability of such algorithms is only poorly understood. Adversarial attacks offer a valuable tool to understand possible failure modes and worst case performance of DL-based reconstruction algorithms. In this paper we describe adversarial attacks on multi-coil $k$-space measurements and evaluate them on the recently proposed E2E-VarNet and a simpler UNet-based model. In contrast to prior work, the attacks are targeted to specifically alter diagnostically relevant regions. Using two realistic attack models (adversarial $k$-space noise and adversarial rotations) we are able to show that current state-of-the-art DL-based reconstruction algorithms are indeed sensitive to such perturbations to a degree where relevant diagnostic information may be lost. Surprisingly, in our experiments the UNet and the more sophisticated E2E-VarNet were similarly sensitive to such attacks. Our findings add further to the evidence that caution must be exercised as DL-based methods move closer to clinical practice.

preprint2022arXiv

An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence

A Bayesian treatment can mitigate overconfidence in ReLU nets around the training data. But far away from them, ReLU Bayesian neural networks (BNNs) can still underestimate uncertainty and thus be asymptotically overconfident. This issue arises since the output variance of a BNN with finitely many features is quadratic in the distance from the data region. Meanwhile, Bayesian linear models with ReLU features converge, in the infinite-width limit, to a particular Gaussian process (GP) with a variance that grows cubically so that no asymptotic overconfidence can occur. While this may seem of mostly theoretical interest, in this work, we show that it can be used in practice to the benefit of BNNs. We extend finite ReLU BNNs with infinite ReLU features via the GP and show that the resulting model is asymptotically maximally uncertain far away from the data while the BNNs' predictive power is unaffected near the data. Although the resulting model approximates a full GP posterior, thanks to its structure, it can be applied \emph{post-hoc} to any pre-trained ReLU BNN at a low cost.

preprint2022arXiv

Being a Bit Frequentist Improves Bayesian Neural Networks

Despite their compelling theoretical properties, Bayesian neural networks (BNNs) tend to perform worse than frequentist methods in classification-based uncertainty quantification (UQ) tasks such as out-of-distribution (OOD) detection. In this paper, based on empirical findings in prior works, we hypothesize that this issue is because even recent Bayesian methods have never considered OOD data in their training processes, even though this "OOD training" technique is an integral part of state-of-the-art frequentist UQ methods. To validate this, we treat OOD data as a first-class citizen in BNN training by exploring four different ways of incorporating OOD data into Bayesian inference. We show in extensive experiments that OOD-trained BNNs are competitive to recent frequentist baselines. This work thus provides strong baselines for future work in Bayesian UQ.

preprint2022arXiv

Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities

It is an important problem in trustworthy machine learning to recognize out-of-distribution (OOD) inputs which are inputs unrelated to the in-distribution task. Many out-of-distribution detection methods have been suggested in recent years. The goal of this paper is to recognize common objectives as well as to identify the implicit scoring functions of different OOD detection methods. We focus on the sub-class of methods that use surrogate OOD data during training in order to learn an OOD detection score that generalizes to new unseen out-distributions at test time. We show that binary discrimination between in- and (different) out-distributions is equivalent to several distinct formulations of the OOD detection problem. When trained in a shared fashion with a standard classifier, this binary discriminator reaches an OOD detection performance similar to that of Outlier Exposure. Moreover, we show that the confidence loss which is used by Outlier Exposure has an implicit scoring function which differs in a non-trivial fashion from the theoretically optimal scoring function in the case where training and test out-distribution are the same, which again is similar to the one used when training an Energy-Based OOD detector or when adding a background class. In practice, when trained in exactly the same way, all these methods perform similarly.

preprint2022arXiv

Evaluating the Adversarial Robustness of Adaptive Test-time Defenses

Adaptive defenses, which optimize at test time, promise to improve adversarial robustness. We categorize such adaptive test-time defenses, explain their potential benefits and drawbacks, and evaluate a representative variety of the latest adaptive defenses for image classification. Unfortunately, none significantly improve upon static defenses when subjected to our careful case study evaluation. Some even weaken the underlying static model while simultaneously increasing inference computation. While these results are disappointing, we still believe that adaptive test-time defenses are a promising avenue of research and, as such, we provide recommendations for their thorough evaluation. We extend the checklist of Carlini et al. (2019) by providing concrete steps specific to adaptive defenses.

preprint2022arXiv

Provably Adversarially Robust Nearest Prototype Classifiers

Nearest prototype classifiers (NPCs) assign to each input point the label of the nearest prototype with respect to a chosen distance metric. A direct advantage of NPCs is that the decisions are interpretable. Previous work could provide lower bounds on the minimal adversarial perturbation in the $\ell_p$-threat model when using the same $\ell_p$-distance for the NPCs. In this paper we provide a complete discussion on the complexity when using $\ell_p$-distances for decision and $\ell_q$-threat models for certification for $p,q \in \{1,2,\infty\}$. In particular we provide scalable algorithms for the \emph{exact} computation of the minimal adversarial perturbation when using $\ell_2$-distance and improved lower bounds in other cases. Using efficient improved lower bounds we train our Provably adversarially robust NPC (PNPC), for MNIST which have better $\ell_2$-robustness guarantees than neural networks. Additionally, we show up to our knowledge the first certification results w.r.t. to the LPIPS perceptual metric which has been argued to be a more realistic threat model for image classification than $\ell_p$-balls. Our PNPC has on CIFAR10 higher certified robust accuracy than the empirical robust accuracy reported in (Laidlaw et al., 2021). The code is available in our repository.

preprint2022arXiv

Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators

Deep neural network (DNN) accelerators received considerable attention in recent years due to the potential to save energy compared to mainstream hardware. Low-voltage operation of DNN accelerators allows to further reduce energy consumption, however, causes bit-level failures in the memory storing the quantized weights. Furthermore, DNN accelerators are vulnerable to adversarial attacks on voltage controllers or individual bits. In this paper, we show that a combination of robust fixed-point quantization, weight clipping, as well as random bit error training (RandBET) or adversarial bit error training (AdvBET) improves robustness against random or adversarial bit errors in quantized DNN weights significantly. This leads not only to high energy savings for low-voltage operation as well as low-precision quantization, but also improves security of DNN accelerators. In contrast to related work, our approach generalizes across operating voltages and accelerators and does not require hardware changes. Moreover, we present a novel adversarial bit error attack and are able to obtain robustness against both targeted and untargeted bit-level attacks. Without losing more than 0.8%/2% in test accuracy, we can reduce energy consumption on CIFAR10 by 20%/30% for 8/4-bit quantization. Allowing up to 320 adversarial bit errors, we reduce test error from above 90% (chance level) to 26.22%.

preprint2022arXiv

Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks

We propose a versatile framework based on random search, Sparse-RS, for score-based sparse targeted and untargeted attacks in the black-box setting. Sparse-RS does not rely on substitute models and achieves state-of-the-art success rate and query efficiency for multiple sparse attack models: $l_0$-bounded perturbations, adversarial patches, and adversarial frames. The $l_0$-version of untargeted Sparse-RS outperforms all black-box and even all white-box attacks for different models on MNIST, CIFAR-10, and ImageNet. Moreover, our untargeted Sparse-RS achieves very high success rates even for the challenging settings of $20\times20$ adversarial patches and $2$-pixel wide adversarial frames for $224\times224$ images. Finally, we show that Sparse-RS can be applied to generate targeted universal adversarial patches where it significantly outperforms the existing approaches. The code of our framework is available at https://github.com/fra31/sparse-rs.

preprint2021arXiv

A unifying Perron-Frobenius theorem for nonnegative tensors via multi-homogeneous maps

We introduce the concept of shape partition of a tensor and formulate a general tensor eigenvalue problem that includes all previously studied eigenvalue problems as special cases. We formulate irreducibility and symmetry properties of a nonnegative tensor $T$ in terms of the associated shape partition. We recast the eigenvalue problem for $T$ as a fixed point problem on a suitable product of projective spaces. This allows us to use the theory of multi-homogeneous order-preserving maps to derive a new and unifying Perron-Frobenius theorem for nonnegative tensors which either implies earlier results of this kind or improves them, as weaker assumptions are required. We introduce a general power method for the computation of the dominant tensor eigenpair, and provide a detailed convergence analysis.

preprint2021arXiv

The Perron-Frobenius theorem for multi-homogeneous mappings

The Perron-Frobenius theory for nonnegative matrices has been generalized to order-preserving homogeneous mappings on a cone and more recently to nonnegative multilinear forms. We unify both approaches by introducing the concept of order-preserving multi-homogeneous mappings, their associated nonlinear spectral problems and spectral radii. We show several Perron-Frobenius type results for these mappings addressing existence, uniqueness and maximality of nonnegative and positive eigenpairs. We prove a Collatz-Wielandt principle and other characterizations of the spectral radius and analyze the convergence of iterates of these mappings towards their unique positive eigenvectors. On top of providing a new extension of the nonlinear Perron-Frobenius theory to the multi-dimensional case, our contribution poses the basis for several improvements and a deeper understanding of the current spectral theory for nonnegative tensors. In fact, in recent years, important results have been obtained by recasting certain spectral equations for multilinear forms in terms of homogeneous maps, however as our approach is more adapted to such problems, these results can be further refined and improved by employing our new multi-homogeneous setting.

preprint2020arXiv

Adversarial Robustness on In- and Out-Distribution Improves Explainability

Neural networks have led to major improvements in image classification but suffer from being non-robust to adversarial changes, unreliable uncertainty estimates on out-distribution samples and their inscrutable black-box decisions. In this work we propose RATIO, a training procedure for Robustness via Adversarial Training on In- and Out-distribution, which leads to robust models with reliable and robust confidence estimates on the out-distribution. RATIO has similar generative properties to adversarial training so that visual counterfactuals produce class specific features. While adversarial training comes at the price of lower clean accuracy, RATIO achieves state-of-the-art $l_2$-adversarial robustness on CIFAR10 and maintains better clean accuracy.

preprint2020arXiv

Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks

The point estimates of ReLU classification networks---arguably the most widely used neural network architecture---have been shown to yield arbitrarily high confidence far away from the training data. This architecture, in conjunction with a maximum a posteriori estimation scheme, is thus not calibrated nor robust. Approximate Bayesian inference has been empirically demonstrated to improve predictive uncertainty in neural networks, although the theoretical analysis of such Bayesian approximations is limited. We theoretically analyze approximate Gaussian distributions on the weights of ReLU networks and show that they fix the overconfidence problem. Furthermore, we show that even a simplistic, thus cheap, Bayesian approximation, also fixes these issues. This indicates that a sufficient condition for a calibrated uncertainty on a ReLU network is "to be a bit Bayesian". These theoretical results validate the usage of last-layer Bayesian approximation and motivate a range of a fidelity-cost trade-off. We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.

preprint2020arXiv

Computing the norm of nonnegative matrices and the log-Sobolev constant of Markov chains

We analyze the global convergence of the power iterates for the computation of a general mixed-subordinate matrix norm. We prove a new global convergence theorem for a class of entrywise nonnegative matrices that generalizes and improves a well-known results for mixed-subordinate $\ell^p$ matrix norms. In particular, exploiting the Birkoff--Hopf contraction ratio of nonnegative matrices, we obtain novel and explicit global convergence guarantees for a range of matrix norms whose computation has been recently proven to be NP-hard in the general case, including the case of mixed-subordinate norms induced by the vector norms made by the sum of different $\ell^p$-norms of subsets of entries. Finally, we use the new results combined with hypercontractive inequalities to prove a new lower bound on the logarithmic Sobolev constant of a Markov chain.

preprint2020arXiv

Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

Adversarial training yields robust models against a specific threat model, e.g., $L_\infty$ adversarial examples. Typically robustness does not generalize to previously unseen threat models, e.g., other $L_p$ norms, or larger perturbations. Our confidence-calibrated adversarial training (CCAT) tackles this problem by biasing the model towards low confidence predictions on adversarial examples. By allowing to reject examples with low confidence, robustness generalizes beyond the threat model employed during training. CCAT, trained only on $L_\infty$ adversarial examples, increases robustness against larger $L_\infty$, $L_2$, $L_1$ and $L_0$ attacks, adversarial frames, distal adversarial examples and corrupted examples and yields better clean accuracy compared to adversarial training. For thorough evaluation we developed novel white- and black-box attacks directly attacking CCAT by maximizing confidence. For each threat model, we use $7$ attacks with up to $50$ restarts and $5000$ iterations and report worst-case robust test error, extended to our confidence-thresholded setting, across all attacks.

preprint2020arXiv

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in \{1,2,\infty\}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality results, minimizes the size of the perturbation (so that it returns the robust accuracy at every threshold with a single run). It performs better or similar to state-of-the-art attacks which are partially specialized to one $l_p$-norm, and is robust to the phenomenon of gradient masking.

preprint2020arXiv

Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$

In recent years several adversarial attacks and defenses have been proposed. Often seemingly robust models turn out to be non-robust when more sophisticated attacks are used. One way out of this dilemma are provable robustness guarantees. While provably robust models for specific $l_p$-perturbation models have been developed, we show that they do not come with any guarantee against other $l_q$-perturbations. We propose a new regularization scheme, MMR-Universal, for ReLU networks which enforces robustness wrt $l_1$- and $l_\infty$-perturbations and show how that leads to the first provably robust models wrt any $l_p$-norm for $p\geq 1$.

preprint2020arXiv

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness. Many promising defenses could be broken later on, making it difficult to identify the state-of-the-art. Frequent pitfalls in the evaluation are improper tuning of hyperparameters of the attacks, gradient obfuscation or masking. In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness. We apply our ensemble to over 50 models from papers published at recent top machine learning and computer vision venues. In all except one of the cases we achieve lower robust test accuracy than reported in these papers, often by more than $10\%$, identifying several broken defenses.

preprint2020arXiv

Square Attack: a query-efficient black-box adversarial attack via random search

We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. Square Attack is based on a randomized search scheme which selects localized square-shaped updates at random positions so that at each iteration the perturbation is situated approximately at the boundary of the feasible set. Our method is significantly more query efficient and achieves a higher success rate compared to the state-of-the-art methods, especially in the untargeted setting. In particular, on ImageNet we improve the average query efficiency in the untargeted setting for various deep networks by a factor of at least $1.8$ and up to $3$ compared to the recent state-of-the-art $l_\infty$-attack of Al-Dujaili & O'Reilly. Moreover, although our attack is black-box, it can also outperform gradient-based white-box attacks on the standard benchmarks achieving a new state-of-the-art in terms of the success rate. The code of our attack is available at https://github.com/max-andr/square-attack.

preprint2020arXiv

Towards neural networks that provably know when they don't know

It has recently been shown that ReLU networks produce arbitrarily over-confident predictions far away from the training data. Thus, ReLU networks do not know when they don't know. However, this is a highly important property in safety critical applications. In the context of out-of-distribution detection (OOD) there have been a number of proposals to mitigate this problem but none of them are able to make any mathematical guarantees. In this paper we propose a new approach to OOD which overcomes both problems. Our approach can be used with ReLU networks and provides provably low confidence predictions far away from the training data as well as the first certificates for low confidence predictions in a neighborhood of an out-distribution point. In the experiments we show that state-of-the-art methods fail in this worst-case setting whereas our model can guarantee its performance while retaining state-of-the-art OOD performance.

preprint2016arXiv

A nodal domain theorem and a higher-order Cheeger inequality for the graph $p$-Laplacian

We consider the nonlinear graph $p$-Laplacian and its set of eigenvalues and associated eigenfunctions of this operator defined by a variational principle. We prove a nodal domain theorem for the graph $p$-Laplacian for any $p\geq 1$. While for $p>1$ the bounds on the number of weak and strong nodal domains are the same as for the linear graph Laplacian ($p=2$), the behavior changes for $p=1$. We show that the bounds are tight for $p\geq 1$ as the bounds are attained by the eigenfunctions of the graph $p$-Laplacian on two graphs. Finally, using the properties of the nodal domains, we prove a higher-order Cheeger inequality for the graph $p$-Laplacian for $p>1$. If the eigenfunction associated to the $k$-th variational eigenvalue of the graph $p$-Laplacian has exactly $k$ strong nodal domains, then the higher order Cheeger inequality becomes tight as $p\rightarrow 1$.

preprint2016arXiv

An Efficient Multilinear Optimization Framework for Hypergraph Matching

Hypergraph matching has recently become a popular approach for solving correspondence problems in computer vision as it allows to integrate higher-order geometric information. Hypergraph matching can be formulated as a third-order optimization problem subject to the assignment constraints which turns out to be NP-hard. In recent work, we have proposed an algorithm for hypergraph matching which first lifts the third-order problem to a fourth-order problem and then solves the fourth-order problem via optimization of the corresponding multilinear form. This leads to a tensor block coordinate ascent scheme which has the guarantee of providing monotonic ascent in the original matching score function and leads to state-of-the-art performance both in terms of achieved matching score and accuracy. In this paper we show that the lifting step to a fourth-order problem can be avoided yielding a third-order scheme with the same guarantees and performance but being two times faster. Moreover, we introduce a homotopy type method which further improves the performance.

preprint2016arXiv

Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification

Top-k error is currently a popular performance measure on large scale image classification benchmarks such as ImageNet and Places. Despite its wide acceptance, our understanding of this metric is limited as most of the previous research is focused on its special case, the top-1 error. In this work, we explore two directions that shed more light on the top-k error. First, we provide an in-depth analysis of established and recently proposed single-label multiclass methods along with a detailed account of efficient optimization algorithms for them. Our results indicate that the softmax loss and the smooth multiclass SVM are surprisingly competitive in top-k error uniformly across all k, which can be explained by our analysis of multiclass top-k calibration. Further improvements for a specific k are possible with a number of proposed top-k loss functions. Second, we use the top-k methods to explore the transition from multiclass to multilabel learning. In particular, we find that it is possible to obtain effective multilabel classifiers on Pascal VOC using a single label per image for training, while the gap between multiclass and multilabel methods on MS COCO is more significant. Finally, our contribution of efficient algorithms for training with the considered top-k and multilabel loss functions is of independent interest.

preprint2016arXiv

Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

The optimization problem behind neural networks is highly non-convex. Training with stochastic gradient descent and variants requires careful parameter tuning and provides no guarantee to achieve the global optimum. In contrast we show under quite weak assumptions on the data that a particular class of feedforward neural networks can be trained globally optimal with a linear convergence rate with our nonlinear spectral method. Up to our knowledge this is the first practically feasible method which achieves such a guarantee. While the method can in principle be applied to deep networks, we restrict ourselves for simplicity in this paper to one and two hidden layer networks. Our experiments confirm that these models are rich enough to achieve good performance on a series of real-world datasets.

preprint2016arXiv

Improved Image Boundaries for Better Video Segmentation

Graph-based video segmentation methods rely on superpixels as starting point. While most previous work has focused on the construction of the graph edges and weights as well as solving the graph partitioning problem, this paper focuses on better superpixels for video segmentation. We demonstrate by a comparative analysis that superpixels extracted from boundaries perform best, and show that boundary estimation can be significantly improved via image and time domain cues. With superpixels generated from our better boundaries we observe consistent improvement for two video segmentation methods in two different datasets.

preprint2016arXiv

Latent Embeddings for Zero-shot Classification

We present a novel latent embedding model for learning a compatibility function between image and class embeddings, in the context of zero-shot classification. The proposed method augments the state-of-the-art bilinear compatibility model by incorporating latent variables. Instead of learning a single bilinear map, it learns a collection of maps with the selection, of which map to use, being a latent variable for the current image-class pair. We train the model with a ranking based objective function which penalizes incorrect rankings of the true class for a given image. We empirically demonstrate that our model improves the state-of-the-art for various class embeddings consistently on three challenging publicly available datasets for the zero-shot setting. Moreover, our method leads to visually highly interpretable results with clear clusters of different fine-grained object properties that correspond to different latent variable maps.

preprint2016arXiv

Loss Functions for Top-k Error: Analysis and Insights

In order to push the performance on realistic computer vision tasks, the number of classes in modern benchmark datasets has significantly increased in recent years. This increase in the number of classes comes along with increased ambiguity between the class labels, raising the question if top-1 error is the right performance measure. In this paper, we provide an extensive comparison and evaluation of established multiclass methods comparing their top-k performance both from a practical as well as from a theoretical perspective. Moreover, we introduce novel top-k loss functions as modifications of the softmax and the multiclass SVM losses and provide efficient optimization schemes for them. In the experiments, we compare on various datasets all of the proposed and established methods for top-k error optimization. An interesting insight of this paper is that the softmax loss yields competitive top-k performance for all k simultaneously. For a specific top-k error, our new top-k losses lead typically to further improvements while being faster to train than the softmax.

preprint2016arXiv

Simple Does It: Weakly Supervised Instance and Semantic Segmentation

Semantic labelling and instance segmentation are two tasks that require particularly costly annotations. Starting from weak supervision in the form of bounding box detection annotations, we propose a new approach that does not require modification of the segmentation training procedure. We show that when carefully designing the input labels from given bounding boxes, even a single round of training is enough to improve over previously reported weakly supervised results. Overall, our weak supervision approach reaches ~95% of the quality of the fully supervised model, both for semantic labelling and instance segmentation.

preprint2015arXiv

A Flexible Tensor Block Coordinate Ascent Scheme for Hypergraph Matching

The estimation of correspondences between two images resp. point sets is a core problem in computer vision. One way to formulate the problem is graph matching leading to the quadratic assignment problem which is NP-hard. Several so called second order methods have been proposed to solve this problem. In recent years hypergraph matching leading to a third order problem became popular as it allows for better integration of geometric information. For most of these third order algorithms no theoretical guarantees are known. In this paper we propose a general framework for tensor block coordinate ascent methods for hypergraph matching. We propose two algorithms which both come along with the guarantee of monotonic ascent in the matching score on the set of discrete assignment matrices. In the experiments we show that our new algorithms outperform previous work both in terms of achieving better matching scores and matching accuracy. This holds in particular for very challenging settings where one has a high number of outliers and other forms of noise.

preprint2015arXiv

Constrained 1-Spectral Clustering

An important form of prior information in clustering comes in form of cannot-link and must-link constraints. We present a generalization of the popular spectral clustering technique which integrates such constraints. Motivated by the recently proposed $1$-spectral clustering for the unconstrained problem, our method is based on a tight relaxation of the constrained normalized cut into a continuous optimization problem. Opposite to all other methods which have been suggested for constrained spectral clustering, we can always guarantee to satisfy all constraints. Moreover, our soft formulation allows to optimize a trade-off between normalized cut and the number of violated constraints. An efficient implementation is provided which scales to large datasets. We outperform consistently all other proposed methods in the experiments.

preprint2015arXiv

Efficient Output Kernel Learning for Multiple Tasks

The paradigm of multi-task learning is that one can achieve better generalization by learning tasks jointly and thus exploiting the similarity between the tasks rather than learning them independently of each other. While previously the relationship between tasks had to be user-defined in the form of an output kernel, recent approaches jointly learn the tasks and the output kernel. As the output kernel is a positive semidefinite matrix, the resulting optimization problems are not scalable in the number of tasks as an eigendecomposition is required in each step. \mbox{Using} the theory of positive semidefinite kernels we show in this paper that for a certain class of regularizers on the output kernel, the constraint of being positive semidefinite can be dropped as it is automatically satisfied for the relaxed problem. This leads to an unconstrained dual problem which can be solved efficiently. Experiments on several multi-task and multi-class data sets illustrate the efficacy of our approach in terms of computational efficiency as well as generalization performance.

preprint2015arXiv

Regularization-free estimation in trace regression with symmetric positive semidefinite matrices

Over the past few years, trace regression models have received considerable attention in the context of matrix completion, quantum state tomography, and compressed sensing. Estimation of the underlying matrix from regularization-based approaches promoting low-rankedness, notably nuclear norm regularization, have enjoyed great popularity. In the present paper, we argue that such regularization may no longer be necessary if the underlying matrix is symmetric positive semidefinite (\textsf{spd}) and the design satisfies certain conditions. In this situation, simple least squares estimation subject to an \textsf{spd} constraint may perform as well as regularization-based approaches with a proper choice of the regularization parameter, which entails knowledge of the noise level and/or tuning. By contrast, constrained least squares estimation comes without any tuning parameter and may hence be preferred due to its simplicity.

preprint2015arXiv

Robust PCA: Optimization of the Robust Reconstruction Error over the Stiefel Manifold

It is well known that Principal Component Analysis (PCA) is strongly affected by outliers and a lot of effort has been put into robustification of PCA. In this paper we present a new algorithm for robust PCA minimizing the trimmed reconstruction error. By directly minimizing over the Stiefel manifold, we avoid deflation as often used by projection pursuit methods. In distinction to other methods for robust PCA, our method has no free parameter and is computationally very efficient. We illustrate the performance on various datasets including an application to background modeling and subtraction. Our method performs better or similar to current state-of-the-art methods while being faster.

preprint2015arXiv

Tensor norm and maximal singular vectors of non-negative tensors - a Perron-Frobenius theorem, a Collatz-Wielandt characterization and a generalized power method

We study the l^{p_1,...,p_m} singular value problem for non-negative tensors. We prove a general Perron-Frobenius theorem for weakly irreducible and irreducible nonnegative tensors and provide a Collatz-Wielandt characterization of the maximal singular value. Additionally, we propose a higher order power method for the computation of the maximal singular vectors and show that it has an asymptotic linear convergence rate.

preprint2015arXiv

Tight Continuous Relaxation of the Balanced $k$-Cut Problem

Spectral Clustering as a relaxation of the normalized/ratio cut has become one of the standard graph-based clustering methods. Existing methods for the computation of multiple clusters, corresponding to a balanced $k$-cut of the graph, are either based on greedy techniques or heuristics which have weak connection to the original motivation of minimizing the normalized cut. In this paper we propose a new tight continuous relaxation for any balanced $k$-cut problem and show that a related recently proposed relaxation is in most cases loose leading to poor performance in practice. For the optimization of our tight continuous relaxation we propose a new algorithm for the difficult sum-of-ratios minimization problem which achieves monotonic descent. Extensive comparisons show that our method outperforms all existing approaches for ratio cut and other balanced $k$-cut criteria.

preprint2015arXiv

Top-k Multiclass SVM

Class ambiguity is typical in image classification problems with a large number of classes. When classes are difficult to discriminate, it makes sense to allow k guesses and evaluate classifiers based on the top-k error instead of the standard zero-one loss. We propose top-k multiclass SVM as a direct method to optimize for top-k performance. Our generalization of the well-known multiclass SVM is based on a tight convex upper bound of the top-k error. We propose a fast optimization scheme based on an efficient projection onto the top-k simplex, which is of its own interest. Experiments on five datasets show consistent improvements in top-k accuracy compared to various baselines.

preprint2015arXiv

Towards Realistic Team Formation in Social Networks based on Densest Subgraphs

Given a task $\mathcal{T}$, a set of experts $V$ with multiple skills and a social network $G(V, W)$ reflecting the compatibility among the experts, team formation is the problem of identifying a team $C \subseteq V$ that is both competent in performing the task $\mathcal{T}$ and compatible in working together. Existing methods for this problem make too restrictive assumptions and thus cannot model practical scenarios. The goal of this paper is to consider the team formation problem in a realistic setting and present a novel formulation based on densest subgraphs. Our formulation allows modeling of many natural requirements such as (i) inclusion of a designated team leader and/or a group of given experts, (ii) restriction of the size or more generally cost of the team (iii) enforcing locality of the team, e.g., in a geographical sense or social sense, etc. The proposed formulation leads to a generalized version of the classical densest subgraph problem with cardinality constraints (DSP), which is an NP hard problem and has many applications in social network analysis. In this paper, we present a new method for (approximately) solving the generalized DSP (GDSP). Our method, FORTE, is based on solving an equivalent continuous relaxation of GDSP. The solution found by our method has a quality guarantee and always satisfies the constraints of GDSP. Experiments show that the proposed formulation (GDSP) is useful in modeling a broader range of team formation problems and that our method produces more coherent and compact teams of high quality. We also show, with the help of an LP relaxation of GDSP, that our method gives close to optimal solutions to GDSP.

preprint2015arXiv

Weakly Supervised Object Boundaries

State-of-the-art learning based boundary detection methods require extensive training data. Since labelling object boundaries is one of the most expensive types of annotations, there is a need to relax the requirement to carefully annotate images to make both the training more affordable and to extend the amount of training data. In this paper we propose a technique to generate weakly supervised annotations and show that bounding box annotations alone suffice to reach high-quality object boundaries without using any object-specific boundary annotations. With the proposed weak supervision techniques we achieve the top performance on the object boundary detection task, outperforming by a large margin the current fully supervised state-of-the-art methods.

preprint2014arXiv

Estimation of positive definite M-matrices and structure learning for attractive Gaussian Markov Random fields

Consider a random vector with finite second moments. If its precision matrix is an M-matrix, then all partial correlations are non-negative. If that random vector is additionally Gaussian, the corresponding Markov random field (GMRF) is called attractive. We study estimation of M-matrices taking the role of inverse second moment or precision matrices using sign-constrained log-determinant divergence minimization. We also treat the high-dimensional case with the number of variables exceeding the sample size. The additional sign-constraints turn out to greatly simplify the estimation problem: we provide evidence that explicit regularization is no longer required. To solve the resulting convex optimization problem, we propose an algorithm based on block coordinate descent, in which each sub-problem can be recast as non-negative least squares problem. Illustrations on both simulated and real world data are provided.

preprint2014arXiv

Learning Using Privileged Information: SVM+ and Weighted SVM

Prior knowledge can be used to improve predictive performance of learning algorithms or reduce the amount of data required for training. The same goal is pursued within the learning using privileged information paradigm which was recently introduced by Vapnik et al. and is aimed at utilizing additional information available only at training time -- a framework implemented by SVM+. We relate the privileged information to importance weighting and show that the prior knowledge expressible with privileged features can also be encoded by weights associated with every training example. We show that a weighted SVM can always replicate an SVM+ solution, while the converse is not true and we construct a counterexample highlighting the limitations of SVM+. Finally, we touch on the problem of choosing weights for weighted SVMs when privileged features are not available.

preprint2014arXiv

Matrix factorization with Binary Components

Motivated by an application in computational biology, we consider low-rank matrix factorization with $\{0,1\}$-constraints on one of the factors and optionally convex constraints on the second one. In addition to the non-convexity shared with other matrix factorization schemes, our problem is further complicated by a combinatorial constraint set of size $2^{m \cdot r}$, where $m$ is the dimension of the data points and $r$ the rank of the factorization. Despite apparent intractability, we provide - in the line of recent work on non-negative matrix factorization by Arora et al. (2012) - an algorithm that provably recovers the underlying factorization in the exact case with $O(m r 2^r + mnr + r^2 n)$ operations for $n$ datapoints. To obtain this result, we use theory around the Littlewood-Offord lemma from combinatorics.

preprint2014arXiv

Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization

Least squares fitting is in general not useful for high-dimensional linear models, in which the number of predictors is of the same or even larger order of magnitude than the number of samples. Theory developed in recent years has coined a paradigm according to which sparsity-promoting regularization is regarded as a necessity in such setting. Deviating from this paradigm, we show that non-negativity constraints on the regression coefficients may be similarly effective as explicit regularization if the design matrix has additional properties, which are met in several applications of non-negative least squares (NNLS). We show that for these designs, the performance of NNLS with regard to prediction and estimation is comparable to that of the lasso. We argue further that in specific cases, NNLS may have a better $\ell_{\infty}$-rate in estimation and hence also advantages with respect to support recovery when combined with thresholding. From a practical point of view, NNLS does not depend on a regularization parameter and is hence easier to use.

preprint2014arXiv

Nonlinear Eigenproblems in Data Analysis - Balanced Graph Cuts and the RatioDCA-Prox

It has been recently shown that a large class of balanced graph cuts allows for an exact relaxation into a nonlinear eigenproblem. We review briefly some of these results and propose a family of algorithms to compute nonlinear eigenvectors which encompasses previous work as special cases. We provide a detailed analysis of the properties and the convergence behavior of these algorithms and then discuss their application in the area of balanced graph cuts.

preprint2013arXiv

Constrained fractional set programs and their application in local clustering and community detection

The (constrained) minimization of a ratio of set functions is a problem frequently occurring in clustering and community detection. As these optimization problems are typically NP-hard, one uses convex or spectral relaxations in practice. While these relaxations can be solved globally optimally, they are often too loose and thus lead to results far away from the optimum. In this paper we show that every constrained minimization problem of a ratio of non-negative set functions allows a tight relaxation into an unconstrained continuous optimization problem. This result leads to a flexible framework for solving constrained problems in network analysis. While a globally optimal solution for the resulting non-convex problem cannot be guaranteed, we outperform the loose convex or spectral relaxations by a large margin on constrained local clustering problems.

preprint2013arXiv

The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

Hypergraphs allow one to encode higher-order relationships in data and are thus a very flexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs.

preprint2011arXiv

Hitting and commute times in large graphs are often misleading

Next to the shortest path distance, the second most popular distance function between vertices in a graph is the commute distance (resistance distance). For two vertices u and v, the hitting time H_{uv} is the expected time it takes a random walk to travel from u to v. The commute time is its symmetrized version C_{uv} = H_{uv} + H_{vu}. In our paper we study the behavior of hitting times and commute distances when the number n of vertices in the graph is very large. We prove that as n converges to infinty, hitting times and commute distances converge to expressions that do not take into account the global structure of the graph at all. Namely, the hitting time H_{uv} converges to 1/d_v and the commute time to 1/d_u + 1/d_v where d_u and d_v denote the degrees of vertices u and v. In these cases, the hitting and commute times are misleading in the sense that they do not provide information about the structure of the graph. We focus on two major classes of random graphs: random geometric graphs (k-nearest neighbor graphs, epsilon-graphs, Gaussian similarity graphs) and random graphs with given expected degrees (in particular, Erdos-Renyi graphs with and without planted partitions)

preprint2011arXiv

How the result of graph clustering methods depends on the construction of the graph

We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to infinity. It turns out that the limit values of the same objective function are systematically different on different types of graphs. This implies that clustering results systematically depend on the graph and can be very different for different types of graph. We provide examples to illustrate the implications on spectral clustering.

preprint2010arXiv

An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA

Many problems in machine learning and statistics can be formulated as (generalized) eigenproblems. In terms of the associated optimization problem, computing linear eigenvectors amounts to finding critical points of a quadratic function subject to quadratic constraints. In this paper we show that a certain class of constrained optimization problems with nonquadratic objective and constraints can be understood as nonlinear eigenproblems. We derive a generalization of the inverse power method which is guaranteed to converge to a nonlinear eigenvector. We apply the inverse power method to 1-spectral clustering and sparse PCA which can naturally be formulated as nonlinear eigenproblems. In both applications we achieve state-of-the-art results in terms of solution quality and runtime. Moving beyond the standard eigenproblem should be useful also in many other applications and our inverse power method can be easily adapted to new problems.

Matthias Hein

What is connected

Connect this record

See the researcher in context

Building this map preview

48 published item(s)

Adversarial Robustness against Multiple and Single $l_p$-Threat Models via Quick Fine-Tuning of Robust Classifiers

Adversarial Robustness of MR Image Reconstruction under Realistic Perturbations

An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence

Being a Bit Frequentist Improves Bayesian Neural Networks

Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities

Evaluating the Adversarial Robustness of Adaptive Test-time Defenses

Provably Adversarially Robust Nearest Prototype Classifiers

Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators

Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks

A unifying Perron-Frobenius theorem for nonnegative tensors via multi-homogeneous maps

The Perron-Frobenius theorem for multi-homogeneous mappings

Adversarial Robustness on In- and Out-Distribution Improves Explainability

Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks

Computing the norm of nonnegative matrices and the log-Sobolev constant of Markov chains

Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Square Attack: a query-efficient black-box adversarial attack via random search

Towards neural networks that provably know when they don't know

A nodal domain theorem and a higher-order Cheeger inequality for the graph $p$-Laplacian

An Efficient Multilinear Optimization Framework for Hypergraph Matching

Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification

Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

Improved Image Boundaries for Better Video Segmentation

Latent Embeddings for Zero-shot Classification

Loss Functions for Top-k Error: Analysis and Insights

Simple Does It: Weakly Supervised Instance and Semantic Segmentation

A Flexible Tensor Block Coordinate Ascent Scheme for Hypergraph Matching

Constrained 1-Spectral Clustering

Efficient Output Kernel Learning for Multiple Tasks

Regularization-free estimation in trace regression with symmetric positive semidefinite matrices

Robust PCA: Optimization of the Robust Reconstruction Error over the Stiefel Manifold

Tensor norm and maximal singular vectors of non-negative tensors - a Perron-Frobenius theorem, a Collatz-Wielandt characterization and a generalized power method

Tight Continuous Relaxation of the Balanced $k$-Cut Problem

Top-k Multiclass SVM

Towards Realistic Team Formation in Social Networks based on Densest Subgraphs

Weakly Supervised Object Boundaries

Estimation of positive definite M-matrices and structure learning for attractive Gaussian Markov Random fields

Learning Using Privileged Information: SVM+ and Weighted SVM

Matrix factorization with Binary Components

Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization

Nonlinear Eigenproblems in Data Analysis - Balanced Graph Cuts and the RatioDCA-Prox

Constrained fractional set programs and their application in local clustering and community detection

The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

Hitting and commute times in large graphs are often misleading

How the result of graph clustering methods depends on the construction of the graph

An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA