Source author record

Saverio Salzo

Saverio Salzo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC math.ST Statistics Theory math.NA Performance Systems and Control

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

The method of Bregman projections in deterministic and stochastic convex feasibility problems

In this work we study the method of Bregman projections for deterministic and stochastic convex feasibility problems with three types of control sequences for the selection of sets during the algorithmic procedure: greedy, random, and adaptive random. We analyze in depth the case of affine feasibility problems showing that the iterates generated by the proposed methods converge Q-linearly and providing also explicit global and local rates of convergence. This work generalizes from one hand recent developments in randomized methods for the solution of linear systems based on orthogonal projection methods. On the other hand, our results yield global and local Q-linear rates of convergence for the Sinkhorn and Greenhorn algorithms in discrete entropic-regularized optimal transport, for the first time, even in the multimarginal setting.

preprint2020arXiv

Efficient Tensor Kernel methods for sparse regression

Recently, classical kernel methods have been extended by the introduction of suitable tensor kernels so to promote sparsity in the solution of the underlying regression problem. Indeed, they solve an lp-norm regularization problem, with p=m/(m-1) and m even integer, which happens to be close to a lasso problem. However, a major drawback of the method is that storing tensors requires a considerable amount of memory, ultimately limiting its applicability. In this work we address this problem by proposing two advances. First, we directly reduce the memory requirement, by intriducing a new and more efficient layout for storing the data. Second, we use a Nystrom-type subsampling approach, which allows for a training phase with a smaller number of data points, so to reduce the computational cost. Experiments, both on synthetic and read datasets, show the effectiveness of the proposed improvements. Finally, we take case of implementing the cose in C++ so to further speed-up the computation.

preprint2020arXiv

On the Iteration Complexity of Hypergradient Computation

We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.

preprint2016arXiv

Regularized Learning Schemes in Feature Banach Spaces

This paper proposes a unified framework for the investigation of constrained learning theory in reflexive Banach spaces of features via regularized empirical risk minimization. The focus is placed on Tikhonov-like regularization with totally convex functions. This broad class of regularizers provides a flexible model for various priors on the features, including in particular hard constraints and powers of Banach norms. In such context, the main results establish a new general form of the representer theorem and the consistency of the corresponding learning schemes under general conditions on the loss function, the geometry of the feature space, and the modulus of total convexity of the regularizer. In addition, the proposed analysis gives new insight into basic tools such as reproducing Banach spaces, feature maps, and universality. Even when specialized to Hilbert spaces, this framework yields new results that extend the state of the art.

preprint2015arXiv

Consistent Learning by Composite Proximal Thresholding

We investigate the modeling and the numerical solution of machine learning problems with prediction functions which are linear combinations of elements of a possibly infinite-dimensional dictionary. We propose a novel flexible composite regularization model, which makes it possible to incorporate various priors on the coefficients of the prediction function, including sparsity and hard constraints. We show that the estimators obtained by minimizing the regularized empirical risk are consistent in a statistical sense, and we design an error-tolerant composite proximal thresholding algorithm for computing such estimators. New results on the asymptotic behavior of the proximal forward-backward splitting method are derived and exploited to establish the convergence properties of the proposed algorithm. In particular, our method features a $o(1/m)$ convergence rate in objective values.

preprint2011arXiv

Convergence analysis of a proximal Gauss-Newton method

An extension of the Gauss-Newton algorithm is proposed to find local minimizers of penalized nonlinear least squares problems, under generalized Lipschitz assumptions. Convergence results of local type are obtained, as well as an estimate of the radius of the convergence ball. Some applications for solving constrained nonlinear equations are discussed and the numerical performance of the method is assessed on some significant test problems.