Source author record

Christine Sinoquet

Christine Sinoquet appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Applications Genomics

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Partially Hidden Markov Chain Linear Autoregressive model: inference and forecasting

Time series subject to change in regime have attracted much interest in domains such as econometry, finance or meteorology. For discrete-valued regimes, some models such as the popular Hidden Markov Chain (HMC) describe time series whose state process is unknown at all time-steps. Sometimes, time series are firstly labelled thanks to some annotation function. Thus, another category of models handles the case with regimes observed at all time-steps. We present a novel model which addresses the intermediate case: (i) state processes associated to such time series are modelled by Partially Hidden Markov Chains (PHMCs); (ii) a linear autoregressive (LAR) model drives the dynamics of the time series, within each regime. We describe a variant of the expection maximization (EM) algorithm devoted to PHMC-LAR model learning. We propose a hidden state inference procedure and a forecasting function that take into account the observed states when existing. We assess inference and prediction performances, and analyze EM convergence times for the new model, using simulated data. We show the benefits of using partially observed states to decrease EM convergence times. A fully labelled scheme with unreliable labels also speeds up EM. This offers promising prospects to enhance PHMC-LAR model selection. We also point out the robustness of PHMC-LAR to labelling errors in inference task, when large training datasets and moderate labelling error rates are considered. Finally, we highlight the remarkable robustness to error labelling in the prediction task, over the whole range of error rates.

preprint2014arXiv

A Survey on Latent Tree Models and Applications

In data analysis, latent variables play a central role because they help provide powerful insights into a wide variety of phenomena, ranging from biological to human sciences. The latent tree model, a particular type of probabilistic graphical models, deserves attention. Its simple structure - a tree - allows simple and efficient inference, while its latent variables capture complex relationships. In the past decade, the latent tree model has been subject to significant theoretical and methodological developments. In this review, we propose a comprehensive study of this model. First we summarize key ideas underlying the model. Second we explain how it can be efficiently learned from data. Third we illustrate its use within three types of applications: latent structure discovery, multidimensional clustering, and probabilistic inference. Finally, we conclude and give promising directions for future researches in this field.

preprint2012arXiv

Alternative Methods for H1 Simulations in Genome Wide Association Studies

Assessing the statistical power to detect susceptibility variants plays a critical role in GWA studies both from the prospective and retrospective points of view. Power is empirically estimated by simulating phenotypes under a disease model H1. For this purpose, the "gold" standard consists in simulating genotypes given the phenotypes (e.g. Hapgen). We introduce here an alternative approach for simulating phenotypes under H1 that does not require generating new genotypes for each simulation. In order to simulate phenotypes with a fixed total number of cases and under a given disease model, we suggest three algorithms: i) a simple rejection algorithm; ii) a numerical Markov Chain Monte-Carlo (MCMC) approach; iii) and an exact and efficient backward sampling algorithm. In our study, we validated the three algorithms both on a toy-dataset and by comparing them with Hapgen on a more realistic dataset. As an application, we then conducted a simulation study on a 1000 Genomes Project dataset consisting of 629 individuals (314 cases) and 8,048 SNPs from Chromosome X. We arbitrarily defined an additive disease model with two susceptibility SNPs and an epistatic effect. The three algorithms are consistent, but backward sampling is dramatically faster than the other two. Our approach also gives consistent results with Hapgen. Using our application data, we showed that our limited design requires a biological a priori to limit the investigated region. We also proved that epistatic effects can play a significant role even when simple marker statistics (e.g. trend) are used. We finally showed that the overall performance of a GWA study strongly depends on the prevalence of the disease: the larger the prevalence, the better the power.