Source author record

Mohamed Hebiri

Mohamed Hebiri appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Neural and Evolutionary Computing

Catalog footprint

What is connected

15works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Active learning algorithm through the lens of rejection arguments

Active learning is a paradigm of machine learning which aims at reducing the amount of labeled data needed to train a classifier. Its overall principle is to sequentially select the most informative data points, which amounts to determining the uncertainty of regions of the input space. The main challenge lies in building a procedure that is computationally efficient and that offers appealing theoretical properties; most of the current methods satisfy only one or the other. In this paper, we use the classification with rejection in a novel way to estimate the uncertain regions. We provide an active learning algorithm and prove its theoretical benefits under classical assumptions. In addition to the theoretical results, numerical experiments have been carried out on synthetic and non-synthetic datasets. These experiments provide empirical evidence that the use of rejection arguments in our active learning algorithm is beneficial and allows good performance in various statistical situations.

preprint2022arXiv

Prediction intervals with controlled length in the heteroscedastic Gaussian regression

We tackle the problem of building a prediction interval in heteroscedastic Gaussian regression. We focus on prediction intervals with constrained expected length in order to guarantee interpretability of the output. In this framework, we derive a closed form expression of the optimal prediction interval that allows for the development a data-driven prediction interval based on plug-in. The construction of the proposed algorithm is based on two samples, one labeled and another unlabeled. Under mild conditions, we show that our procedure is asymptotically as good as the optimal prediction interval both in terms of expected length and error rate. In particular, the control of the expected length is distribution-free. We also derive rates of convergence under smoothness and the Tsybakov noise conditions. We conduct a numerical analysis that exhibits the good performance of our method. It also indicates that even with a few amount of unlabeled data, our method is very effective in enforcing the length constraint.

preprint2021arXiv

Regression with reject option and application to kNN

We investigate the problem of regression where one is allowed to abstain from predicting. We refer to this framework as regression with reject option as an extension of classification with reject option. In this context, we focus on the case where the rejection rate is fixed and derive the optimal rule which relies on thresholding the conditional variance function. We provide a semi-supervised estimation procedure of the optimal rule involving two datasets: a first labeled dataset is used to estimate both regression function and conditional variance function while a second unlabeled dataset is exploited to calibrate the desired rejection rate. The resulting predictor with reject option is shown to be almost as good as the optimal predictor with reject option both in terms of risk and rejection rate. We additionally apply our methodology with kNN algorithm and establish rates of convergence for the resulting kNN predictor under mild conditions. Finally, a numerical study is performed to illustrate the benefit of using the proposed procedure.

preprint2021arXiv

Set-valued classification -- overview via a unified framework

Multi-class classification problem is among the most popular and well-studied statistical frameworks. Modern multi-class datasets can be extremely ambiguous and single-output predictions fail to deliver satisfactory performance. By allowing predictors to predict a set of label candidates, set-valued classification offers a natural way to deal with this ambiguity. Several formulations of set-valued classification are available in the literature and each of them leads to different prediction strategies. The present survey aims to review popular formulations using a unified statistical framework. The proposed framework encompasses previously considered and leads to new formulations as well as it allows to understand underlying trade-offs of each formulation. We provide infinite sample optimal set-valued classification strategies and review a general plug-in principle to construct data-driven algorithms. The exposition is supported by examples and pointers to both theoretical and practical contributions. Finally, we provide experiments on real-world datasets comparing these approaches in practice and providing general practical guidelines.

preprint2020arXiv

Fair Regression with Wasserstein Barycenters

We study the problem of learning a real-valued function that satisfies the Demographic Parity constraint. It demands the distribution of the predicted output to be independent of the sensitive attribute. We consider the case that the sensitive attribute is available for prediction. We establish a connection between fair regression and optimal transport theory, based on which we derive a close form expression for the optimal fair predictor. Specifically, we show that the distribution of this optimum is the Wasserstein barycenter of the distributions induced by the standard regression function on the sensitive groups. This result offers an intuitive interpretation of the optimal fair prediction and suggests a simple post-processing algorithm to achieve fairness. We establish risk and distribution-free fairness guarantees for this procedure. Numerical experiments indicate that our method is very effective in learning fair models, with a relative increase in error rate that is inferior to the relative gain in fairness.

preprint2020arXiv

Layer Sparsity in Neural Networks

Sparsity has become popular in machine learning, because it can save computational resources, facilitate interpretations, and prevent overfitting. In this paper, we discuss sparsity in the framework of neural networks. In particular, we formulate a new notion of sparsity that concerns the networks' layers and, therefore, aligns particularly well with the current trend toward deep networks. We call this notion layer sparsity. We then introduce corresponding regularization and refitting schemes that can complement standard deep-learning pipelines to generate more compact and accurate networks.

preprint2020arXiv

Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification

We study the problem of fair binary classification using the notion of Equal Opportunity. It requires the true positive rate to distribute equally across the sensitive groups. Within this setting we show that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold. We provide a constructive expression for the threshold. This result motivates us to devise a plug-in classification procedure based on both unlabeled and labeled datasets. While the latter is used to learn the output conditional probability, the former is used for calibration. The overall procedure can be computed in polynomial time and it is shown to be statistically consistent both in terms of the classification error and fairness measure. Finally, we present numerical experiments which indicate that our method is often superior or competitive with the state-of-the-art methods on benchmark datasets.

preprint2016arXiv

On the Prediction Performance of the Lasso

Although the Lasso has been extensively studied, the relationship between its prediction performance and the correlations of the covariates is not fully understood. In this paper, we give new insights into this relationship in the context of multiple linear regression. We show, in particular, that the incorporation of a simple correlation measure into the tuning parameter can lead to a nearly optimal prediction performance of the Lasso even for highly correlated covariates. However, we also reveal that for moderately correlated covariates, the prediction performance of the Lasso can be mediocre irrespective of the choice of the tuning parameter. We finally show that our results also lead to near-optimal rates for the least-squares estimator with total variation penalty.

preprint2015arXiv

Consistency of plug-in confidence sets for classification in semi-supervised learning

Confident prediction is highly relevant in machine learning; for example, in applications such as medical diagnoses, wrong prediction can be fatal. For classification, there already exist procedures that allow to not classify data when the confidence in their prediction is weak. This approach is known as classification with reject option. In the present paper, we provide new methodology for this approach. Predicting a new instance via a confidence set, we ensure an exact control of the probability of classification. Moreover, we show that this methodology is easily implementable and entails attractive theoretical and numerical properties.

preprint2013arXiv

Learning Heteroscedastic Models by Convex Programming under Group Sparsity

Popular sparse estimation methods based on $\ell_1$-relaxation, such as the Lasso and the Dantzig selector, require the knowledge of the variance of the noise in order to properly tune the regularization parameter. This constitutes a major obstacle in applying these methods in several frameworks---such as time series, random fields, inverse problems---for which the noise is rarely homoscedastic and its level is hard to know in advance. In this paper, we propose a new approach to the joint estimation of the conditional mean and the conditional variance in a high-dimensional (auto-) regression setting. An attractive feature of the proposed estimator is that it is efficiently computable even for very large scale problems by solving a second-order cone program (SOCP). We present theoretical analysis and numerical results assessing the performance of the proposed procedure.

preprint2013arXiv

Rank penalized estimation of a quantum system

We introduce a new method to reconstruct the density matrix $ρ$ of a system of $n$-qubits and estimate its rank $d$ from data obtained by quantum state tomography measurements repeated $m$ times. The procedure consists in minimizing the risk of a linear estimator $\hatρ$ of $ρ$ penalized by given rank (from 1 to $2^n$), where $\hatρ$ is previously obtained by the moment method. We obtain simultaneously an estimator of the rank and the resulting density matrix associated to this rank. We establish an upper bound for the error of penalized estimator, evaluated with the Frobenius norm, which is of order $dn(4/3)^n /m$ and consistency for the estimator of the rank. The proposed methodology is computationaly efficient and is illustrated with some example states and real experimental data sets.

preprint2012arXiv

How Correlations Influence Lasso Prediction

We study how correlations in the design matrix influence Lasso prediction. First, we argue that the higher the correlations are, the smaller the optimal tuning parameter is. This implies in particular that the standard tuning parameters, that do not depend on the design matrix, are not favorable. Furthermore, we argue that Lasso prediction works well for any degree of correlations if suitable tuning parameters are chosen. We study these two subjects theoretically as well as with simulations.

preprint2011arXiv

Generalization of l1 constraints for high dimensional regression problems

We focus on the high dimensional linear regression $Y\sim\mathcal{N}(Xβ^{*},σ^{2}I_{n})$, where $β^{*}\in\mathds{R}^{p}$ is the parameter of interest. In this setting, several estimators such as the LASSO and the Dantzig Selector are known to satisfy interesting properties whenever the vector $β^{*}$ is sparse. Interestingly both of the LASSO and the Dantzig Selector can be seen as orthogonal projections of 0 into $\mathcal{DC}(s)=\{β\in\mathds{R}^{p},\|X'(Y-Xβ)\|_{\infty}\leq s\}$ - using an $\ell_{1}$ distance for the Dantzig Selector and $\ell_{2}$ for the LASSO. For a well chosen $s>0$, this set is actually a confidence region for $β^{*}$. In this paper, we investigate the properties of estimators defined as projections on $\mathcal{DC}(s)$ using general distances. We prove that the obtained estimators satisfy oracle properties close to the one of the LASSO and Dantzig Selector. On top of that, it turns out that these estimators can be tuned to exploit a different sparsity or/and slightly different estimation objectives.

preprint2011arXiv

The Smooth-Lasso and other $\ell_1+\ell_2$-penalized methods

We consider a linear regression problem in a high dimensional setting where the number of covariates $p$ can be much larger than the sample size $n$. In such a situation, one often assumes sparsity of the regression vector, \textit i.e., the regression vector contains many zero components. We propose a Lasso-type estimator $\hatβ^{Quad}$ (where '$Quad$' stands for quadratic) which is based on two penalty terms. The first one is the $\ell_1$ norm of the regression coefficients used to exploit the sparsity of the regression as done by the Lasso estimator, whereas the second is a quadratic penalty term introduced to capture some additional information on the setting of the problem. We detail two special cases: the Elastic-Net $\hatβ^{EN}$, which deals with sparse problems where correlations between variables may exist; and the Smooth-Lasso $\hatβ^{SL}$, which responds to sparse problems where successive regression coefficients are known to vary slowly (in some situations, this can also be interpreted in terms of correlations between successive variables). From a theoretical point of view, we establish variable selection consistency results and show that $\hatβ^{Quad}$ achieves a Sparsity Inequality, \textit i.e., a bound in terms of the number of non-zero components of the 'true' regression vector. These results are provided under a weaker assumption on the Gram matrix than the one used by the Lasso. In some situations this guarantees a significant improvement over the Lasso. Furthermore, a simulation study is conducted and shows that the S-Lasso $\hatβ^{SL}$ performs better than known methods as the Lasso, the Elastic-Net $\hatβ^{EN}$, and the Fused-Lasso with respect to the estimation accuracy. This is especially the case when the regression vector is 'smooth', \textit i.e., when the variations between successive coefficients of the unknown parameter of the regression are small. The study also reveals that the theoretical calibration of the tuning parameters and the one based on 10 fold cross validation imply two S-Lasso solutions with close performance.

preprint2010arXiv

Transductive versions of the LASSO and the Dantzig Selector

Transductive methods are useful in prediction problems when the training dataset is composed of a large number of unlabeled observations and a smaller number of labeled observations. In this paper, we propose an approach for developing transductive prediction procedures that are able to take advantage of the sparsity in the high dimensional linear regression. More precisely, we define transductive versions of the LASSO and the Dantzig Selector . These procedures combine labeled and unlabeled observations of the training dataset to produce a prediction for the unlabeled observations. We propose an experimental study of the transductive estimators, that shows that they improve the LASSO and Dantzig Selector in many situations, and particularly in high dimensional problems when the predictors are correlated. We then provide non-asymptotic theoretical guarantees for these estimation methods. Interestingly, our theoretical results show that the Transductive LASSO and Dantzig Selector satisfy sparsity inequalities under weaker assumptions than those required for the "original" LASSO.

Mohamed Hebiri

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Active learning algorithm through the lens of rejection arguments

Prediction intervals with controlled length in the heteroscedastic Gaussian regression

Regression with reject option and application to kNN

Set-valued classification -- overview via a unified framework

Fair Regression with Wasserstein Barycenters

Layer Sparsity in Neural Networks

Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification

On the Prediction Performance of the Lasso

Consistency of plug-in confidence sets for classification in semi-supervised learning

Learning Heteroscedastic Models by Convex Programming under Group Sparsity

Rank penalized estimation of a quantum system

How Correlations Influence Lasso Prediction

Generalization of l1 constraints for high dimensional regression problems

The Smooth-Lasso and other $\ell_1+\ell_2$-penalized methods

Transductive versions of the LASSO and the Dantzig Selector