Source author record

Ruben Zamar

Ruben Zamar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Robust graphical lasso based on multivariate Winsorization

We propose the use of a robust covariance estimator based on multivariate Winsorization in the context of the Tarr-Muller-Weber framework for sparse estimation of the precision matrix of a Gaussian graphical model. Likewise Croux-Ollerer's precision matrix estimator, our proposed estimator attains the maximum finite sample breakdown point of 0.5 under cellwise contamination. We conduct an extensive Monte Carlo simulation study to assess the performance of ours and the currently existing proposals. We find that ours has a competitive behavior, regarding the the estimation of the precision matrix and the recovery of the graph. We demonstrate the usefulness of the proposed methodology in a real application to breast cancer data.

preprint2022arXiv

Split Regression Modeling

Sparse methods are the standard approach to obtain interpretable models with high prediction accuracy. Alternatively, algorithmic ensemble methods can achieve higher prediction accuracy at the cost of loss of interpretability. However, the use of blackbox methods has been heavily criticized for high-stakes decisions and it has been argued that there does not have to be a trade-off between accuracy and interpretability. To combine high accuracy with interpretability, we generalize best subset selection to best split selection. Best split selection constructs a small number of sparse models learned jointly from the data which are then combined in an ensemble. Best split selection determines the models by splitting the available predictor variables among the different models when fitting the data. The proposed methodology results in an ensemble of sparse and diverse models that each provide a possible explanation for the relationship between the predictors and the response. The high computational cost of best split selection motivates the need for computational tractable approximations. We evaluate a method developed by Christidis et al. (2020) which can be seen as a multi-convex relaxation of best split selection.

preprint2012arXiv

A robust and sparse K-means clustering algorithm

In many situations where the interest lies in identifying clusters one might expect that not all available variables carry information about these groups. Furthermore, data quality (e.g. outliers or missing entries) might present a serious and sometimes hard-to-assess problem for large and complex datasets. In this paper we show that a small proportion of atypical observations might have serious adverse effects on the solutions found by the sparse clustering algorithm of Witten and Tibshirani (2010). We propose a robustification of their sparse K-means algorithm based on the trimmed K-means algorithm of Cuesta-Albertos et al. (1997) Our proposal is also able to handle datasets with missing values. We illustrate the use of our method on microarray data for cancer patients where we are able to identify strong biological clusters with a much reduced number of genes. Our simulation studies show that, when there are outliers in the data, our robust sparse K-means algorithm performs better than other competing methods both in terms of the selection of features and also the identified clusters. This robust sparse K-means algorithm is implemented in the R package RSKC which is publicly available from the CRAN repository.