Researcher profile

Céline Lévy-Leduc

Céline Lévy-Leduc contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2022arXiv

Identification of prognostic and predictive biomarkers in high-dimensional data with PPLasso

In clinical trials, identification of prognostic and predictive biomarkers is essential to precision medicine. Prognostic biomarkers can be useful for the prevention of the occurrence of the disease, and predictive biomarkers can be used to identify patients with potential benefit from the treatment. Previous researches were mainly focused on clinical characteristics, and the use of genomic data in such an area is hardly studied. A new method is required to simultaneously select prognostic and predictive biomarkers in high dimensional genomic data where biomarkers are highly correlated. We propose a novel approach called PPLasso (Prognostic Predictive Lasso) integrating prognostic and predictive effects into one statistical model. PPLasso also takes into account the correlations between biomarkers that can alter the biomarker selection accuracy. Our method consists in transforming the design matrix to remove the correlations between the biomarkers before applying the generalized Lasso. In a comprehensive numerical evaluation, we show that PPLasso outperforms the traditional Lasso approach on both prognostic and predictive biomarker identification in various scenarios. Finally, our method is applied to publicly available transcriptomic data from clinical trial RV144. Our method is implemented in the PPLasso R package available from the Comprehensive R Archive Network (CRAN).

preprint2022arXiv

Variable selection in high-dimensional logistic regression models using a whitening approach

In bioinformatics, the rapid development of sequencing technology has enabled us to collect an increasing amount of omics data. Classification based on omics data is one of the central problems in biomedical research. However, omics data usually has a limited sample size but high feature dimensions, and it is assumed that only a few features (biomarkers) are active, i.e. informative to discriminate between different categories (cancer subtypes, responder/non-responder to treatment, for example). Identifying active biomarkers for classification has therefore become fundamental for omics data analysis. Focusing on binary classification, we propose an innovative feature selection method aiming at dealing with the high correlations between the biomarkers. Various research has shown the notorious influence of correlated biomarkers and the difficulty of accurately identifying active ones. Our method, WLogit, consists in whitening the design matrix to remove the correlations between biomarkers, then using a penalized criterion adapted to the logistic regression model to select features. The performance of WLogit is assessed using synthetic data in several scenarios and compared with other approaches. The results suggest that WLogit can identify almost all active biomarkers even in the cases where the biomarkers are highly correlated, while the other methods fail, which consequently leads to higher classification accuracy. The performance is also evaluated on the classification of two Lymphoma subtypes, and the obtained classifier also outperformed other methods. Our method is implemented in the \texttt{WLogit} R package available from the Comprehensive R Archive Network (CRAN).

preprint2022arXiv

Variable selection in sparse GLARMA models

In this paper, we propose a novel and efficient two-stage variable selection approach for sparse GLARMA models, which are pervasive for modeling discrete-valued time series. Our approach consists in iteratively combining the estimation of the autoregressive moving average (ARMA) coefficients of GLARMA models with regularized methods designed for performing variable selection in regression coefficients of Generalized Linear Models (GLM). We first establish the consistency of the ARMA part coefficient estimators in a specific case. Then, we explain how to efficiently implement our approach. Finally, we assess the performance of our methodology using synthetic data, compare it with alternative methods and illustrate it on an example of real-world application. Our approach, which is implemented in the GlarmaVarSel R package and available on the CRAN, is very attractive since it benefits from a low computational load and is able to outperform the other methods in terms of coefficient estimation, particularly in recovering the non null regression coefficients.

preprint2020arXiv

A variable selection approach for highly correlated predictors in high-dimensional genomic data

In genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models. However, these methods can fail in highly correlated settings. We propose a novel variable selection approach called WLasso, taking these correlations into account. It consists in rewriting the initial high-dimensional linear model to remove the correlation between the biomarkers (predictors) and in applying the generalized Lasso criterion. The performance of WLasso is assessed using synthetic data in several scenarios and compared with recent alternative approaches. The results show that when the biomarkers are highly correlated, WLasso outperforms the other approaches in sparse high-dimensional frameworks. The method is also successfully illustrated on publicly available gene expression data in breast cancer. Our method is implemented in the WLasso R package which is available from the Comprehensive R Archive Network.

preprint2012arXiv

Homogeneity and change-point detection tests for multivariate data using rank statistics

Detecting and locating changes in highly multivariate data is a major concern in several current statistical applications. In this context, the first contribution of the paper is a novel non-parametric two-sample homogeneity test for multivariate data based on the well-known Wilcoxon rank statistic. The proposed two-sample homogeneity test statistic can be extended to deal with ordinal or censored data as well as to test for the homogeneity of more than two samples. The second contribution of the paper concerns the use of the proposed test statistic to perform retrospective change-point analysis. It is first shown that the approach is computationally feasible even when looking for a large number of change-points thanks to the use of dynamic programming. Computable asymptotic $p$-values for the test are then provided in the case where a single potential change-point is to be detected. Compared to available alternatives, the proposed approach appears to be very reliable and robust. This is particularly true in situations where the data is contaminated by outliers or corrupted by noise and where the potential changes only affect subsets of the coordinates of the data.

preprint2011arXiv

Distributed detection/localization of change-points in high-dimensional network traffic data

We propose a novel approach for distributed statistical detection of change-points in high-volume network traffic. We consider more specifically the task of detecting and identifying the targets of Distributed Denial of Service (DDoS) attacks. The proposed algorithm, called DTopRank, performs distributed network anomaly detection by aggregating the partial information gathered in a set of network monitors. In order to address massive data while limiting the communication overhead within the network, the approach combines record filtering at the monitor level and a nonparametric rank test for doubly censored time series at the central decision site. The performance of the DTopRank algorithm is illustrated both on synthetic data as well as from a traffic trace provided by a major Internet service provider.

preprint2011arXiv

OMP-type Algorithm with Structured Sparsity Patterns for Multipath Radar Signals

A transmitted, unknown radar signal is observed at the receiver through more than one path in additive noise. The aim is to recover the waveform of the intercepted signal and to simultaneously estimate the direction of arrival (DOA). We propose an approach exploiting the parsimonious time-frequency representation of the signal by applying a new OMP-type algorithm for structured sparsity patterns. An important issue is the scalability of the proposed algorithm since high-dimensional models shall be used for radar signals. Monte-Carlo simulations for modulated signals illustrate the good performance of the method even for low signal-to-noise ratios and a gain of 20 dB for the DOA estimation compared to some elementary method.

preprint2011arXiv

Robust Retrospective Multiple Change-point Estimation for Multivariate Data

We propose a non-parametric statistical procedure for detecting multiple change-points in multidimensional signals. The method is based on a test statistic that generalizes the well-known Kruskal-Wallis procedure to the multivariate setting. The proposed approach does not require any knowledge about the distribution of the observations and is parameter-free. It is computationally efficient thanks to the use of dynamic programming and can also be applied when the number of change-points is unknown. The method is shown through simulations to be more robust than alternatives, particularly when faced with atypical distributions (e.g., with outliers), high noise levels and/or high-dimensional data.

preprint2010arXiv

Asymptotic properties of U-processes under long-range dependence

Let $(X_i)_{i\geq 1}$ be a stationary mean-zero Gaussian process with covariances $ρ(k)=\PE(X_{1}X_{k+1})$ satisfying: $ρ(0)=1$ and $ρ(k)=k^{-D} L(k)$ where $D$ is in $(0,1)$ and $L$ is slowly varying at infinity. Consider the $U$-process $\{U_n(r),\; r\in I\}$ defined as $$ U_n(r)=\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}\1_{\{G(X_i,X_j)\leq r\}}\; , $$ where $I$ is an interval included in $\rset$ and $G$ is a symmetric function. In this paper, we provide central and non-central limit theorems for $U_n$. They are used to derive the asymptotic behavior of the Hodges-Lehmann estimator, the Wilcoxon-signed rank statistic, the sample correlation integral and an associated scale estimator. The limiting distributions are expressed through multiple Wiener-Itô integrals.

preprint2010arXiv

Central limit theorem for the robust log-regression wavelet estimation of the memory parameter in the Gaussian semi-parametric context

In this paper, we study robust estimators of the memory parameter d of a (possibly) non stationary Gaussian time series with generalized spectral density f. This generalized spectral density is characterized by the memory parameter d and by a function f* which specifies the short-range dependence structure of the process. Our setting is semi-parametric since both f* and d are unknown and d is the only parameter of interest. The memory parameter d is estimated by regressing the logarithm of the estimated variance of the wavelet coefficients at different scales. The two estimators of d that we consider are based on robust estimators of the variance of the wavelet coefficients, namely the square of the scale estimator proposed by Rousseeuw and Croux (1993) and the median of the square of the wavelet coefficients. We establish a Central Limit Theorem for these robust estimators as well as for the estimator of d based on the classical estimator of the variance proposed by Moulines, Roueff and Taqqu (2007). Some Monte-Carlo experiments are presented to illustrate our claims and compare the performance of the different estimators. The properties of the three estimators are also compared on the Nile River data and the Internet traffic packet counts data. The theoretical results and the empirical evidence strongly suggest using the robust estimators as an alternative to estimate the memory parameter d of Gaussian time series.