Researcher profile

Gregory Nuel

Gregory Nuel contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2014arXiv

Non-subjective power analysis to detect G*E interactions in Genome-Wide Association Studies in presence of confounding factor

It is generally acknowledged that most complex diseases are affected in part by interactions between genes and genes and/or between genes and environmental factors. Taking into account environmental exposures and their interactions with genetic factors in genome-wide association studies (GWAS) can help to identify high-risk subgroups in the population and provide a better understanding of the disease. For this reason, many methods have been developed to detect gene-environment (G*E) interactions. Despite this, few loci that interact with environmental exposures have been identified so far. Indeed, the modest effect of G*E interactions as well as confounding factors entail low statistical power to detect such interactions. In this work, we provide a simulated dataset in order to study methods for detecting G*E interactions in GWAS in presence of confounding factor and population structure. Our work applies a recently introduced non-subjective method for H1 simulations called waffect and exploits the publicly available HapMap project to build a datasets with real genotypes and population structures. We use this dataset to study the impact of confounding factors and compare the relative performance of popular methods such as PLINK, random forests and linear mixed models to detect G*E interactions. Presence of confounding factor is an obstacle to detect G*E interactions in GWAS and the approaches considered in our power study all have insufficient power to detect the strong simulated interaction. Our simulated dataset could help to develop new methods which account for confounding factors through latent exposures in order to improve power.

preprint2013arXiv

Fast estimation of posterior probabilities in change-point models through a constrained hidden Markov model

The detection of change-points in heterogeneous sequences is a statistical challenge with applications across a wide variety of fields. In bioinformatics, a vast amount of methodology exists to identify an ideal set of change-points for detecting Copy Number Variation (CNV). While considerable efficient algorithms are currently available for finding the best segmentation of the data in CNV, relatively few approaches consider the important problem of assessing the uncertainty of the change-point location. Asymptotic and stochastic approaches exist but often require additional model assumptions to speed up the computations, while exact methods have quadratic complexity which usually are intractable for large datasets of tens of thousands points or more. In this paper, we suggest an exact method for obtaining the posterior distribution of change-points with linear complexity, based on a constrained hidden Markov model. The methods are implemented in the R package postCP, which uses the results of a given change-point detection algorithm to estimate the probability that each observation is a change-point. We present the results of the package on a publicly available CNV data set (n=120). Due to its frequentist framework, postCP obtains less conservative confidence intervals than previously published Bayesian methods, but with linear complexity instead of quadratic. Simulations showed that postCP provided comparable loss to a Bayesian MCMC method when estimating posterior means, specifically when assessing larger-scale changes, while being more computationally efficient. On another high-resolution CNV data set (n=14,241), the implementation processed information in less than one second on a mid-range laptop computer.

preprint2013arXiv

Fast estimation of the ICL criterion for change-point detection problems with applications to Next-Generation Sequencing data

In this paper, we consider the Integrated Completed Likelihood (ICL) as a useful criterion for estimating the number of changes in the underlying distribution of data in problems where detecting the precise location of these changes is the main goal. The exact computation of the ICL requires O(Kn2) operations (with K the number of segments and n the number of data-points) which is prohibitive in many practical situations with large sequences of data. We describe a framework to estimate the ICL with O(Kn) complexity. Our approach is general in the sense that it can accommodate any given model distribution. We checked the run-time and validity of our approach on simulated data and demonstrate its good performance when analyzing real Next-Generation Sequencing (NGS) data using a negative binomial model.

preprint2013arXiv

From GWAS to transcriptomics in prospective cancer design - new statistical challenges

Background. With the increasing interest in post-GWAS research which represents a transition from genome-wide association discovery to analysis of functional mechanisms, attention has been lately focused on the potential of including various biological material in epidemiological studies. In particular, exploration of the carcinogenic process through transcriptional analysis at the epidemiological level opens up new horizons in functional analysis and causal inference, and requires a new design together with adequate analysis procedures. Results. In this article, we present the post-genome design implemented in the NOWAC cohort as an example of a prospective nested case-control study built for transcriptomics use, and discuss analytical strategies to explore the changes occurring in transcriptomics during the carcinogenic process in association with questionnaire information. We emphasize the inadequacy of survival analysis models usually considered in GWAS for post-genome design, and propose instead to parameterize the gene trajectories during the carcinogenic process. Conclusions. This novel approach, in which transcriptomics are considered as potential intermediate biomarkers of cancer and exposures, offers a flexible framework which can include various biological assumptions.

preprint2012arXiv

Alternative Methods for H1 Simulations in Genome Wide Association Studies

Assessing the statistical power to detect susceptibility variants plays a critical role in GWA studies both from the prospective and retrospective points of view. Power is empirically estimated by simulating phenotypes under a disease model H1. For this purpose, the "gold" standard consists in simulating genotypes given the phenotypes (e.g. Hapgen). We introduce here an alternative approach for simulating phenotypes under H1 that does not require generating new genotypes for each simulation. In order to simulate phenotypes with a fixed total number of cases and under a given disease model, we suggest three algorithms: i) a simple rejection algorithm; ii) a numerical Markov Chain Monte-Carlo (MCMC) approach; iii) and an exact and efficient backward sampling algorithm. In our study, we validated the three algorithms both on a toy-dataset and by comparing them with Hapgen on a more realistic dataset. As an application, we then conducted a simulation study on a 1000 Genomes Project dataset consisting of 629 individuals (314 cases) and 8,048 SNPs from Chromosome X. We arbitrarily defined an additive disease model with two susceptibility SNPs and an epistatic effect. The three algorithms are consistent, but backward sampling is dramatically faster than the other two. Our approach also gives consistent results with Hapgen. Using our application data, we showed that our limited design requires a biological a priori to limit the investigated region. We also proved that epistatic effects can play a significant role even when simple marker statistics (e.g. trend) are used. We finally showed that the overall performance of a GWA study strongly depends on the prevalence of the disease: the larger the prevalence, the better the power.

preprint2012arXiv

Hidden Markov Model Applications in Change-Point Analysis

The detection of change-points in heterogeneous sequences is a statistical challenge with many applications in fields such as finance, signal analysis and biology. A wide variety of literature exists for finding an ideal set of change-points for characterizing the data. In this tutorial we elaborate on the Hidden Markov Model (HMM) and present two different frameworks for applying HMM to change-point models. Then we provide a summary of two procedures for inference in change-point analysis, which are particular cases of the forward-backward algorithm for HMMs, and discuss common implementation problems. Lastly, we provide two examples of the HMM methods on available data sets and we shortly discuss about the applications to current genomics studies. The R code used in the examples is provided in the appendix.