Researcher profile

Emilie Lebarbier

Emilie Lebarbier contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2020arXiv

A new segmentation method for the homogenisation of GNSS-derived IWV time-series

Homogenization is an important and crucial step to improve the usage of observational data for climate analysis. This work is motivated by the analysis of long series of GNSS Integrated Water Vapour (IWV) data which have not yet been used in this context. This paper proposes a novel segmentation method that integrates a periodic bias and a heterogeneous, monthly varying, variance. The method consists in estimating first the variance using a robust estimator and then estimating the segmentation and periodic bias iteratively. This strategy allows for the use of the dynamic programming algorithm that remains the most efficient exact algorithm to estimate the change-point positions. The statistical performance of the method is assessed through numerical experiments. An application to a real data set of 120 global GNSS stations is presented. The method is implemented in the R package GNSSseg that will be available on the CRAN.

preprint2015arXiv

Model selection for the segmentation of multiparameter exponential family distributions

We consider the segmentation problem of univariate distributions from the exponential family with multiple parameters. In segmentation, the choice of the number of segments remains a difficult issue due to the discrete nature of the change-points. In this general exponential family distribution framework, we propose a penalized log-likelihood estimator where the penalty is inspired by papers of L. Birgé and P. Massart. The resulting estimator is proved to satisfy an oracle inequality. We then further study the particular case of categorical variables by comparing the values of the key constants when derived from the specification of our general approach and when obtained by working directly with the characteristics of this distribution. Finally, a simulation study is conducted to assess the performance of our criterion for the exponential distribution, and an application on real data modelled by the categorical distribution is provided.

preprint2015arXiv

SegCorr: a statistical procedure for the detection of genomic regions of correlated expression

Motivation: Detecting local correlations in expression between neighbor genes along the genome has proved to be an effective strategy to identify possible causes of transcriptional deregulation in cancer. It has been successfully used to illustrate the role of mechanisms such as copy number variation (CNV) or epigenetic alterations as factors that may significantly alter expression in large chromosomic regions (gene silencing or gene activation). Results: The identification of correlated regions requires segmenting the gene expression correlation matrix into regions of homogeneously correlated genes and assessing whether the observed local correlation is significantly higher than the background chromosomal correlation. A unified statistical framework is proposed to achieve these two tasks, where optimal segmentation is efficiently performed using dynamic programming algorithm, and detection of highly correlated regions is then achieved using an exact test procedure. We also propose a simple and efficient procedure to correct the expression signal for mechanisms already known to impact expression correlation. The performance and robustness of the proposed procedure, called SegCorr, are evaluated on simulated data. The procedure is illustrated on cancer data, where the signal is corrected for correlations possibly caused by copy number variation. The correction permitted the detection of regions with high correlations linked to DNA methylation. Availability and implementation: R package SegCorr is available on the CRAN.

preprint2014arXiv

Segmentation of multiple series using a Lasso strategy

We propose a new semi-parametric approach to the joint segmentation of multiple series corrupted by a functional part. This problem appears in particular in geodesy where GPS permanent station coordinate series are affected by undocumented artificial abrupt changes and additionally show prominent periodic variations. Detecting and estimating them are crucial, since those series are used to determine averaged reference coordinates in geosciences and to infer small tectonic motions induced by climate change. We propose an iterative procedure based on Dynamic Programming for the segmentation part and Lasso estimators for the functional part. Our Lasso procedure, based on the dictionary approach, allows us to both estimate smooth functions and functions with local irregularity, which permits more flexibility than previous proposed methods. This yields to a better estimation of the bias part and improvements in the segmentation. The performance of our method is assessed using simulated and real data. In particular, we apply our method to data from four GPS stations in Yarragadee, Australia. Our estimation procedure results to be a reliable tool to assess series in terms of change detection and periodic variations estimation giving an interpretable estimation of the functional part of the model in terms of known functions.

preprint2013arXiv

Segmentation of the Poisson and negative binomial rate models: a penalized estimator

We consider the segmentation problem of Poisson and negative binomial (i.e. overdispersed Poisson) rate distributions. In segmentation, an important issue remains the choice of the number of segments. To this end, we propose a penalized log-likelihood estimator where the penalty function is constructed in a non-asymptotic context following the works of L. Birgé and P. Massart. The resulting estimator is proved to satisfy an oracle inequality. The performances of our criterion is assessed using simulated and real datasets in the RNA-seq data analysis context.

preprint2013arXiv

Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data

Genome annotation is an important issue in biology which has long been addressed with gene prediction methods and manual experiments requiring biological expertise. The expanding Next Generation Sequencing technologies and their enhanced precision allow a new approach to the domain: the segmentation of RNA-Seq data to determine gene boundaries. Because of its almost linear complexity, we propose to use the Pruned Dynamic Programming Algorithm, which performances had been acknowledged for CGH arrays, for Seq-experiment outputs. This requires the adaptation of the algorithm to the negative binomial distribution with which we model the data. We show that if the dispersion in the signal is known, the PDP algorithm can be used and we provide an estimator for this dispersion. We then propose to estimate the number of segments, which can be associated to coding or non-coding regions of the genome, using an oracle penalty. We illustrate the results of our approach on a real data-set and show its good performance. Our algorithm is available as an R package on the CRAN repository.