Source author record

Zaid Harchaoui

Zaid Harchaoui appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision math.OC math.ST Statistics Theory math.PR Computation Cryptography and Security Information Theory math.IT

Catalog footprint

What is connected

25works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Asymptotics of Discrete Schrödinger Bridges via Chaos Decomposition

Consider the problem of matching two independent i.i.d. samples of size $N$ from two distributions $P$ and $Q$ in $\mathbb{R}^d$. For an arbitrary continuous cost function, the optimal assignment problem looks for the matching that minimizes the total cost. We consider instead in this paper the problem where each matching is endowed with a Gibbs probability weight proportional to the exponential of the negative total cost of that matching. Viewing each matching as a joint distribution with $N$ atoms, we then take a convex combination with respect to the above Gibbs probability measure. We show that this resulting random joint distribution converges, as $N\rightarrow \infty$, to the solution of a variational problem, introduced by Föllmer, called the Schrödinger problem. We also derive the first two error terms of orders $N^{-1/2}$ and $N^{-1}$, respectively. This gives us central limit theorems for integrated test functions, including for the cost of transport, and second order Gaussian chaos limits when the limiting Gaussian variance is zero. The proofs are based on a novel chaos decomposition of the discrete Schrödinger bridge by polynomial functions of the pair of empirical distributions as the first and second order Taylor approximations in the space of measures. This is achieved by extending the Hoeffding decomposition from the classical theory of U-statistics.

preprint2022arXiv

Confidence Sets under Generalized Self-Concordance

This paper revisits a fundamental problem in statistical inference from a non-asymptotic theoretical viewpoint $\unicode{x2013}$ the construction of confidence sets. We establish a finite-sample bound for the estimator, characterizing its asymptotic behavior in a non-asymptotic fashion. An important feature of our bound is that its dimension dependency is captured by the effective dimension $\unicode{x2013}$ the trace of the limiting sandwich covariance $\unicode{x2013}$ which can be much smaller than the parameter dimension in some regimes. We then illustrate how the bound can be used to obtain a confidence set whose shape is adapted to the optimization landscape induced by the loss function. Unlike previous works that rely heavily on the strong convexity of the loss function, we only assume the Hessian is lower bounded at optimum and allow it to gradually becomes degenerate. This property is formalized by the notion of generalized self-concordance which originated from convex optimization. Moreover, we demonstrate how the effective dimension can be estimated from data and characterize its estimation accuracy. We apply our results to maximum likelihood estimation with generalized linear models, score matching with exponential families, and hypothesis testing with Rao's score test.

preprint2022arXiv

Entropy Regularized Optimal Transport Independence Criterion

We introduce an independence criterion based on entropy regularized optimal transport. Our criterion can be used to test for independence between two samples. We establish non-asymptotic bounds for our test statistic and study its statistical behavior under both the null hypothesis and the alternative hypothesis. The theoretical results involve tools from U-process theory and optimal transport theory. We also offer a random feature type approximation for large-scale problems, as well as a differentiable program implementation for deep learning applications. We present experimental results on existing benchmarks for independence testing, illustrating the interest of the proposed criterion to capture both linear and nonlinear dependencies in synthetic data and real data.

preprint2022arXiv

Orthogonal Statistical Learning with Self-Concordant Loss

Orthogonal statistical learning and double machine learning have emerged as general frameworks for two-stage statistical prediction in the presence of a nuisance component. We establish non-asymptotic bounds on the excess risk of orthogonal statistical learning methods with a loss function satisfying a self-concordance property. Our bounds improve upon existing bounds by a dimension factor while lifting the assumption of strong convexity. We illustrate the results with examples from multiple treatment effect estimation and generalized partially linear modeling.

preprint2022arXiv

Superquantile-based learning: a direct approach using gradient-based optimization

We consider a formulation of supervised learning that endows models with robustness to distributional shifts from training to testing. The formulation hinges upon the superquantile risk measure, also known as the conditional value-at-risk, which has shown promise in recent applications of machine learning and signal processing. We show that, thanks to a direct smoothing of the superquantile function, a superquantile-based learning objective is amenable to gradient-based optimization, using batch optimization algorithms such as gradient descent or quasi-Newton algorithms, or using stochastic optimization algorithms such as stochastic gradient algorithms. A companion software SPQR implements in Python the algorithms described and allows practitioners to experiment with superquantile-based supervised learning.

preprint2022arXiv

Superquantiles at Work: Machine Learning Applications and Efficient Subgradient Computation

R. Tyrell Rockafellar and collaborators introduced, in a series of works, new regression modeling methods based on the notion of superquantile (or conditional value-at-risk). These methods have been influential in economics, finance, management science, and operations research in general. Recently, they have been the subject of a renewed interest in machine learning, to address issues of distributional robustness and fair allocation. In this paper, we review some of these new applications of the superquantile, with references to recent developments. These applications involve nonsmooth superquantile-based objective functions that admit explicit subgradient calculations. To make these superquantile-based functions amenable to the gradient-based algorithms popular in machine learning, we show how to smooth them by infimal convolution and describe numerical procedures to compute the gradients of the smooth approximations. We put the approach into perspective by comparing it to other smoothing techniques and by illustrating it on toy examples.

preprint2021arXiv

A Spectral Analysis of Dot-product Kernels

We present eigenvalue decay estimates of integral operators associated with compositional dot-product kernels. The estimates improve on previous ones established for power series kernels on spheres. This allows us to obtain the volumes of balls in the corresponding reproducing kernel Hilbert spaces. We discuss the consequences on statistical estimation with compositional dot product kernels and highlight interesting trade-offs between the approximation error and the statistical error depending on the number of compositions and the smoothness of the kernels.

preprint2021arXiv

Adaptive Denoising of Signals with Local Shift-Invariant Structure

We discuss the problem of adaptive discrete-time signal denoising in the situation where the signal to be recovered admits a "linear oracle" -- an unknown linear estimate that takes the form of convolution of observations with a time-invariant filter. It was shown by Juditsky and Nemirovski (2009) that when the $\ell_2$-norm of the oracle filter is small enough, such oracle can be "mimicked" by an efficiently computable adaptive estimate of the same structure with an observation-driven filter. The filter in question was obtained as a solution to the optimization problem in which the $\ell_\infty$-norm of the Discrete Fourier Transform (DFT) of the estimation residual is minimized under constraint on the $\ell_1$-norm of the filter DFT. In this paper, we discuss a new family of adaptive estimates which rely upon minimizing the $\ell_2$-norm of the estimation residual. We show that such estimators possess better statistical properties than those based on $\ell_\infty$-fit; in particular, we prove oracle inequalities for their $\ell_2$-loss and improved bounds for $\ell_2$- and pointwise losses. The oracle inequalities rely on the "approximate shift-invariance" assumption stating that the signal to be recovered is close to an (unknown) shift-invariant subspace. We also study the relationship of the approximate shift-invariance assumption with the "signal simplicity" assumption introduced in Juditsky and Nemirovski (2009) and discuss the application of the proposed approach to harmonic oscillations denoising.

preprint2021arXiv

Advances and Open Problems in Federated Learning

Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

preprint2021arXiv

Triangular Flows for Generative Modeling: Statistical Consistency, Smoothness Classes, and Fast Rates

Triangular flows, also known as Knöthe-Rosenblatt measure couplings, comprise an important building block of normalizing flow models for generative modeling and density estimation, including popular autoregressive flow models such as real-valued non-volume preserving transformation models (Real NVP). We present statistical guarantees and sample complexity bounds for triangular flow statistical models. In particular, we establish the statistical consistency and the finite sample convergence rates of the Kullback-Leibler estimator of the Knöthe-Rosenblatt measure coupling using tools from empirical process theory. Our results highlight the anisotropic geometry of function classes at play in triangular flows, shed light on optimal coordinate ordering, and lead to statistical guarantees for Jacobian flows. We conduct numerical experiments on synthetic data to illustrate the practical implications of our theoretical findings.

preprint2020arXiv

An Elementary Approach to Convergence Guarantees of Optimization Algorithms for Deep Networks

We present an approach to obtain convergence guarantees of optimization algorithms for deep networks based on elementary arguments and computations. The convergence analysis revolves around the analytical and computational structures of optimization oracles central to the implementation of deep networks in machine learning software. We provide a systematic way to compute estimates of the smoothness constants that govern the convergence behavior of first-order optimization algorithms used to train deep networks. A diverse set of example components and architectures arising in modern deep networks intersperse the exposition to illustrate the approach.

preprint2016arXiv

Convolutional Patch Representations for Image Retrieval: an Unsupervised Approach

Convolutional neural networks (CNNs) have recently received a lot of attention due to their ability to model local stationary structures in natural images in a multi-scale fashion, when learning all model parameters with supervision. While excellent performance was achieved for image classification when large amounts of labeled visual data are available, their success for un-supervised tasks such as image retrieval has been moderate so far. Our paper focuses on this latter setting and explores several methods for learning patch descriptors without supervision with application to matching and instance-level retrieval. To that effect, we propose a new family of convolutional descriptors for patch representation , based on the recently introduced convolutional kernel networks. We show that our descriptor, named Patch-CKN, performs better than SIFT as well as other convolutional networks learned by artificially introducing supervision and is significantly faster to train. To demonstrate its effectiveness, we perform an extensive evaluation on standard benchmarks for patch and image retrieval where we obtain state-of-the-art results. We also introduce a new dataset called RomePatches, which allows to simultaneously study descriptor performance for patch and image retrieval.

preprint2016arXiv

Fast and Simple Optimization for Poisson Likelihood Models

Poisson likelihood models have been prevalently used in imaging, social networks, and time series analysis. We propose fast, simple, theoretically-grounded, and versatile, optimization algorithms for Poisson likelihood modeling. The Poisson log-likelihood is concave but not Lipschitz-continuous. Since almost all gradient-based optimization algorithms rely on Lipschitz-continuity, optimizing Poisson likelihood models with a guarantee of convergence can be challenging, especially for large-scale problems. We present a new perspective allowing to efficiently optimize a wide range of penalized Poisson likelihood objectives. We show that an appropriate saddle point reformulation enjoys a favorable geometry and a smooth structure. Therefore, we can design a new gradient-based optimization algorithm with $O(1/t)$ convergence rate, in contrast to the usual $O(1/\sqrt{t})$ rate of non-smooth minimization alternatives. Furthermore, in order to tackle problems with large samples, we also develop a randomized block-decomposition variant that enjoys the same convergence rate yet more efficient iteration cost. Experimental results on several point process applications including social network estimation and temporal recommendation show that the proposed algorithm and its randomized block variant outperform existing methods both on synthetic and real-world datasets.

preprint2015arXiv

A Universal Catalyst for First-Order Optimization

We introduce a generic scheme for accelerating first-order optimization methods in the sense of Nesterov, which builds upon a new analysis of the accelerated proximal point algorithm. Our approach consists of minimizing a convex objective by approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. This strategy applies to a large class of algorithms, including gradient descent, block coordinate descent, SAG, SAGA, SDCA, SVRG, Finito/MISO, and their proximal variants. For all of these methods, we provide acceleration and explicit support for non-strongly convex objectives. In addition to theoretical speed-up, we also show that acceleration is useful in practice, especially for ill-conditioned problems where we measure significant improvements.

preprint2015arXiv

Beat-Event Detection in Action Movie Franchises

While important advances were recently made towards temporally localizing and recognizing specific human actions or activities in videos, efficient detection and classification of long video chunks belonging to semantically defined categories such as "pursuit" or "romance" remains challenging.We introduce a new dataset, Action Movie Franchises, consisting of a collection of Hollywood action movie franchises. We define 11 non-exclusive semantic categories - called beat-categories - that are broad enough to cover most of the movie footage. The corresponding beat-events are annotated as groups of video shots, possibly overlapping.We propose an approach for localizing beat-events based on classifying shots into beat-categories and learning the temporal constraints between shots. We show that temporal constraints significantly improve the classification performance. We set up an evaluation protocol for beat-event localization as well as for shot classification, depending on whether movies from the same franchise are present or not in the training data.

preprint2015arXiv

DeepMatching: Hierarchical Deformable Dense Matching

We introduce a novel matching algorithm, called DeepMatching, to compute dense correspondences between images. DeepMatching relies on a hierarchical, multi-layer, correlational architecture designed for matching images and was inspired by deep convolutional approaches. The proposed matching algorithm can handle non-rigid deformations and repetitive textures and efficiently determines dense correspondences in the presence of significant changes between images. We evaluate the performance of DeepMatching, in comparison with state-of-the-art matching algorithms, on the Mikolajczyk (Mikolajczyk et al 2005), the MPI-Sintel (Butler et al 2012) and the Kitti (Geiger et al 2013) datasets. DeepMatching outperforms the state-of-the-art algorithms and shows excellent results in particular for repetitive textures.We also propose a method for estimating optical flow, called DeepFlow, by integrating DeepMatching in the large displacement optical flow (LDOF) approach of Brox and Malik (2011). Compared to existing matching algorithms, additional robustness to large displacements and complex motion is obtained thanks to our matching approach. DeepFlow obtains competitive performance on public benchmarks for optical flow estimation.

preprint2015arXiv

EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow

We propose a novel approach for optical flow estimation , targeted at large displacements with significant oc-clusions. It consists of two steps: i) dense matching by edge-preserving interpolation from a sparse set of matches; ii) variational energy minimization initialized with the dense matches. The sparse-to-dense interpolation relies on an appropriate choice of the distance, namely an edge-aware geodesic distance. This distance is tailored to handle occlusions and motion boundaries -- two common and difficult issues for optical flow computation. We also propose an approximation scheme for the geodesic distance to allow fast computation without loss of performance. Subsequent to the dense interpolation step, standard one-level variational energy minimization is carried out on the dense matches to obtain the final flow estimation. The proposed approach, called Edge-Preserving Interpolation of Correspondences (EpicFlow) is fast and robust to large displacements. It significantly outperforms the state of the art on MPI-Sintel and performs on par on Kitti and Middlebury.

preprint2015arXiv

Label-Embedding for Image Classification

Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label-embedding problem: each class is embedded in the space of attribute vectors. We introduce a function that measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, given an image, the correct classes rank higher than the incorrect ones. Results on the Animals With Attributes and Caltech-UCSD-Birds datasets show that the proposed framework outperforms the standard Direct Attribute Prediction baseline in a zero-shot learning scenario. Label embedding enjoys a built-in ability to leverage alternative sources of information instead of or in addition to attributes, such as e.g. class hierarchies or textual descriptions. Moreover, label embedding encompasses the whole range of learning settings from zero-shot learning to regular learning with a large number of labeled examples.

preprint2015arXiv

Learning to track for spatio-temporal action localization

We propose an effective approach for spatio-temporal action localization in realistic videos. The approach first detects proposals at the frame-level and scores them with a combination of static and motion CNN features. It then tracks high-scoring proposals throughout the video using a tracking-by-detection approach. Our tracker relies simultaneously on instance-level and class-level detectors. The tracks are scored using a spatio-temporal motion histogram, a descriptor at the track level, in combination with the CNN features. Finally, we perform temporal localization of the action using a sliding-window approach at the track level. We present experimental results for spatio-temporal localization on the UCF-Sports, J-HMDB and UCF-101 action localization datasets, where our approach outperforms the state of the art with a margin of 15%, 7% and 12% respectively in mAP.

preprint2015arXiv

Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization

We propose a new first-order optimisation algorithm to solve high-dimensional non-smooth composite minimisation problems. Typical examples of such problems have an objective that decomposes into a non-smooth empirical risk part and a non-smooth regularisation penalty. The proposed algorithm, called Semi-Proximal Mirror-Prox, leverages the Fenchel-type representation of one part of the objective while handling the other part of the objective via linear minimization over the domain. The algorithm stands in contrast with more classical proximal gradient algorithms with smoothing, which require the computation of proximal operators at each iteration and can therefore be impractical for high-dimensional problems. We establish the theoretical convergence rate of Semi-Proximal Mirror-Prox, which exhibits the optimal complexity bounds, i.e. $O(1/ε^2)$, for the number of calls to linear minimization oracle. We present promising experimental results showing the interest of the approach in comparison to competing methods.

preprint2014arXiv

Convolutional Kernel Networks

An important goal in visual recognition is to devise image representations that are invariant to particular transformations. In this paper, we address this goal with a new type of convolutional neural network (CNN) whose invariance is encoded by a reproducing kernel. Unlike traditional approaches where neural networks are learned either to represent data or for solving a classification task, our network learns to approximate the kernel feature map on training data. Such an approach enjoys several benefits over classical ones. First, by teaching CNNs to be invariant, we obtain simple network architectures that achieve a similar accuracy to more complex ones, while being easy to train and robust to overfitting. Second, we bridge a gap between the neural network literature and kernels, which are natural tools to model invariance. We evaluate our methodology on visual recognition tasks where CNNs have proven to perform well, e.g., digit recognition with the MNIST dataset, and the more challenging CIFAR-10 and STL-10 datasets, where our accuracy is competitive with the state of the art.

preprint2014arXiv

Fast and Robust Archetypal Analysis for Representation Learning

We revisit a pioneer unsupervised learning technique called archetypal analysis, which is related to successful data analysis methods such as sparse coding and non-negative matrix factorization. Since it was proposed, archetypal analysis did not gain a lot of popularity even though it produces more interpretable models than other alternatives. Because no efficient implementation has ever been made publicly available, its application to important scientific problems may have been severely limited. Our goal is to bring back into favour archetypal analysis. We propose a fast optimization scheme using an active-set strategy, and provide an efficient open-source implementation interfaced with Matlab, R, and Python. Then, we demonstrate the usefulness of archetypal analysis for computer vision tasks, such as codebook learning, signal classification, and large image collection visualization.

preprint2014arXiv

High-dimensional change-point detection with sparse alternatives

We consider the problem of detecting a change in mean in a sequence of Gaussian vectors. Under the alternative hypothesis, the change occurs only in some subset of the components of the vector. We propose a test of the presence of a change-point that is adaptive to the number of changing components. Under the assumption that the vector dimension tends to infinity and the length of the sequence grows slower than the dimension of the signal, we obtain the detection boundary for this problem and prove its rate-optimality.

preprint2014arXiv

On learning to localize objects with minimal supervision

Learning to localize objects with minimal supervision is an important problem in computer vision, since large fully annotated datasets are extremely costly to obtain. In this paper, we propose a new method that achieves this goal with only image-level labels of whether the objects are present or not. Our approach combines a discriminative submodular cover problem for automatically discovering a set of positive object windows with a smoothed latent SVM formulation. The latter allows us to leverage efficient quasi-Newton optimization techniques. Our experiments demonstrate that the proposed approach provides a 50% relative improvement in mean average precision over the current state-of-the-art on PASCAL VOC 2007 detection.

preprint2013arXiv

Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization

Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone $K$, a norm $\|\cdot\|$ and a smooth convex function $f$, we want either 1) to minimize the norm over the intersection of the cone and a level set of $f$, or 2) to minimize over the cone the sum of $f$ and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) $\|\cdot\|$ is "too complicated" to allow for computationally cheap Bregman projections required in the first-order proximal gradient algorithms. On the other hand, we assume that {it is relatively easy to minimize linear forms over the intersection of $K$ and the unit $\|\cdot\|$-ball}. Motivating examples are given by the nuclear norm with $K$ being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications.

Zaid Harchaoui

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

Asymptotics of Discrete Schrödinger Bridges via Chaos Decomposition

Confidence Sets under Generalized Self-Concordance

Entropy Regularized Optimal Transport Independence Criterion

Orthogonal Statistical Learning with Self-Concordant Loss

Superquantile-based learning: a direct approach using gradient-based optimization

Superquantiles at Work: Machine Learning Applications and Efficient Subgradient Computation

A Spectral Analysis of Dot-product Kernels

Adaptive Denoising of Signals with Local Shift-Invariant Structure

Advances and Open Problems in Federated Learning

Triangular Flows for Generative Modeling: Statistical Consistency, Smoothness Classes, and Fast Rates

An Elementary Approach to Convergence Guarantees of Optimization Algorithms for Deep Networks

Convolutional Patch Representations for Image Retrieval: an Unsupervised Approach

Fast and Simple Optimization for Poisson Likelihood Models

A Universal Catalyst for First-Order Optimization

Beat-Event Detection in Action Movie Franchises

DeepMatching: Hierarchical Deformable Dense Matching

EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow

Label-Embedding for Image Classification

Learning to track for spatio-temporal action localization

Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization

Convolutional Kernel Networks

Fast and Robust Archetypal Analysis for Representation Learning

High-dimensional change-point detection with sparse alternatives

On learning to localize objects with minimal supervision

Conditional Gradient Algorithms for Norm-Regularized Smooth Convex Optimization