Source author record

Vivian Viallon

Vivian Viallon appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

8works
7topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

On the estimation of inclusion probabilities for weighted analyses of nested case control studies

Nested case-control (NCC) studies are a widely adopted design in epidemiology to investigate exposure-disease relationships. This paper examines weighted analyses in NCC studies, focusing on two prominent weighting methods: Kaplan-Meier (KM) weights and Generalized Additive Model (GAM) weights. We consider three target estimands: log-hazard ratios, conditional survival, and associations between exposures. While KM- and GAM-weights are generally robust, we identify specific scenarios where they can lead to biased estimates. We demonstrate that KM-weights can lead to biased estimates when a proportion of the originating cohort is effectively ineligible for NCC selection, particularly with small case proportions or numerous matching factors. Instead, GAM-weights can yield biased results if interactions between matching factors influence disease risk and are not adequately incorporated into weight calculation. Using Directed Acyclic Graphs (DAGs), we develop a framework to systematically determine which variables should be included in weight calculations. We show that the optimal set of variables depends on the target estimand and the causal relationships between matching factors, exposures, and disease risk. We illustrate our findings with both synthetic and real data from the European Prospective Investigation into Cancer and nutrition (EPIC) study. Additionally, we extend the application of GAM-weights to "untypical" NCC studies, where only a subset of cases are included. Our work provides crucial insights for conducting accurate and robust weighted analyses in NCC studies.

preprint2016arXiv

Can collider bias fully explain the obesity paradox?

The "obesity paradox" has been reported in several observational studies, where obesity was shown to be associated to a decreased mortality in individuals suffering from a chronic disease, such as diabetes or heart failure. Causal arguments have recently been given to explain this apparently paradoxical fact: because the chronic disease is caused by obesity, the observed "protective effect" of obesity among patients with, say, diabetes, actually has no causal value. Recently, Sperrin et al. (2016) relaunched the debate and claimed that the resulting bias, the so-called collider bias, was unlikely to be the main explanation for the obesity paradox. However, a number of issues in their work make their conclusions questionable. In this article, we first study the bias between (i) the association between obesity and early death among patients suffering from the chronic disease $Δ_{AS}$ and (ii) the causal effect considered by Sperrin et al. Under the usual framework of structural causal models, we explain why this bias can be much higher than what these authors reported. We further consider alternative causal effects of potential interest and study their difference with $Δ_{AS}$. Numerical examples are presented to illustrate the magnitude of these differences under realistic scenarios. We show that it is possible to have a negative $Δ_{AS}$, while the causal effects we considered are all positive. Therefore, even under the very simple generative model we considered, collider bias can be the sole cause of the obesity paradox.

preprint2016arXiv

Regression modeling on stratified data with the lasso

We consider the estimation of regression models on strata defined using a categorical covariate, in order to identify interactions between this categorical covariate and the other predictors. A basic approach requires the choice of a reference stratum. We show that the performance of a penalized version of this approach depends on this arbitrary choice. We propose a refined approach that bypasses this arbitrary choice, at almost no additional computational cost. Regarding model selection consistency, our proposal mimics the strategy based on an optimal and covariate-specific choice for the reference stratum. Results from an empirical study confirm that our proposal generally outperforms the basic approach in the identification and description of the interactions. An illustration on gene expression data is provided.

preprint2014arXiv

Joint estimation of $K$ related regression models with simple $L_1$-norm penalties

We propose a new approach, along with refinements, based on $L_1$ penalties and aimed at jointly estimating several related regression models. Its main interest is that it can be rewritten as a weighted lasso on a simple transformation of the original data set. In particular, it does not need new dedicated algorithms and is ready to implement under a variety of regression models, {\em e.g.}, using standard R packages. Moreover, asymptotic oracle properties are derived along with preliminary non-asymptotic results, suggesting good theoretical properties. Our approach is further compared with state-of-the-art competitors under various settings on synthetic data: these empirical results confirm that our approach performs at least similarly to its competitors. As a final illustration, an analysis of road safety data is provided.

preprint2012arXiv

Time-dependent AUC with right-censored data: a survey study

The ROC curve and the corresponding AUC are popular tools for the evaluation of diagnostic tests. They have been recently extended to assess prognostic markers and predictive models. However, due to the many particularities of time-to-event outcomes, various definitions and estimators have been proposed in the literature. This review article aims at presenting the ones that accommodate to right-censoring, which is common when evaluating such prognostic markers.

preprint2011arXiv

Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems

We describe a fast method to eliminate features (variables) in l1 -penalized least-square regression (or LASSO) problems. The elimination of features leads to a potentially substantial reduction in running time, specially for large values of the penalty parameter. Our method is not heuristic: it only eliminates features that are guaranteed to be absent after solving the LASSO problem. The feature elimination step is easy to parallelize and can test each feature for elimination independently. Moreover, the computational effort of our method is negligible compared to that of solving the LASSO problem - roughly it is the same as single gradient step. Our method extends the scope of existing LASSO algorithms to treat larger data sets, previously out of their reach. We show how our method can be extended to general l1 -penalized convex problems and present preliminary results for the Sparse Support Vector Machine and Logistic Regression problems.

preprint2010arXiv

An empirical comparative study of approximate methods for binary graphical models; application to the search of associations among causes of death in French death certificates

Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. Following the ideas of the LASSO procedure designed for the linear regression framework, recent developments dealing with graphical model selection have been based on $\ell_1$-penalization. In the binary case, however, exact inference is generally very slow or even intractable because of the form of the so-called log-partition function. Various approximate methods have recently been proposed in the literature and the main objective of this paper is to compare them. Through an extensive simulation study, we show that a simple modification of a method relying on a Gaussian approximation achieves good performance and is very fast. We present a real application in which we search for associations among causes of death recorded on French death certificates.

preprint2010arXiv

Safe Feature Elimination in Sparse Supervised Learning

We investigate fast methods that allow to quickly eliminate variables (features) in supervised learning problems involving a convex loss function and a $l_1$-norm penalty, leading to a potentially substantial reduction in the number of variables prior to running the supervised learning algorithm. The methods are not heuristic: they only eliminate features that are {\em guaranteed} to be absent after solving the learning problem. Our framework applies to a large class of problems, including support vector machine classification, logistic regression and least-squares. The complexity of the feature elimination step is negligible compared to the typical computational effort involved in the sparse supervised learning problem: it grows linearly with the number of features times the number of examples, with much better count if data is sparse. We apply our method to data sets arising in text classification and observe a dramatic reduction of the dimensionality, hence in computational effort required to solve the learning problem, especially when very sparse classifiers are sought. Our method allows to immediately extend the scope of existing algorithms, allowing us to run them on data sets of sizes that were out of their reach before.