Source author record

Peggy Cénac

Peggy Cénac appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR math.ST Statistics Theory Computation Data Structures and Algorithms Machine Learning math.DS Quantitative Methods

Catalog footprint

What is connected

13works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

An efficient Averaged Stochastic Gauss-Newton algorithm for estimating parameters of non linear regressions models

Non linear regression models are a standard tool for modeling real phenomena, with several applications in machine learning, ecology, econometry... Estimating the parameters of the model has garnered a lot of attention during many years. We focus here on a recursive method for estimating parameters of non linear regressions. Indeed, these kinds of methods, whose most famous are probably the stochastic gradient algorithm and its averaged version, enable to deal efficiently with massive data arriving sequentially. Nevertheless, they can be, in practice, very sensitive to the case where the eigen-values of the Hessian of the functional we would like to minimize are at different scales. To avoid this problem, we first introduce an online Stochastic Gauss-Newton algorithm. In order to improve the estimates behavior in case of bad initialization, we also introduce a new Averaged Stochastic Gauss-Newton algorithm and prove its asymptotic efficiency.

preprint2020arXiv

Variable Length Memory Chains: characterization of stationary probability measures

Variable Length Memory Chains (VLMC), which are generalizations of finite order Markov chains, turn out to be an essential tool to modelize random sequences in many domains, as well as an interesting object in contemporary probability theory. The question of the existence of stationary probability measures leads us to introduce a key combinatorial structure for words produced by a VLMC: the Longest Internal Suffix. This notion allows us to state a necessary and sufficient condition for a general VLMC to admit a unique invariant probability measure. This condition turns out to get a much simpler form for a subclass of VLMC: the stable VLMC. This natural subclass, unlike the general case, enjoys a renewal property. Namely, a stable VLMC induces a semi-Markov chain on an at most countable state space. Unfortunately, this discrete time renewal process does not contain the whole information of the VLMC, preventing the study of a stable VLMC to be reduced to the study of its induced semi-Markov chain. For a subclass of stable VLMC, the convergence in distribution of a VLMC towards its stationary probability measure is established. Finally, finite state space semi-Markov chains turn out to be very special stable VLMC, shedding some new light on their limit distributions.

preprint2016arXiv

Persistent random walks. II. Functional Scaling Limits

We give a complete and unified description -- under some stability assumptions -- of the functional scaling limits associated with some persistent random walks for which the recurrent or transient type is studied in [1]. As a result, we highlight a phase transition phenomenon with respect to the memory. It turns out that the limit process is either Markovian or not according to -- to put it in a nutshell -- the rate of decrease of the distribution tails corresponding to the persistent times. In the memoryless situation, the limits are classical strictly stable L{é}vy processes of infinite variations. However, we point out that the description of the critical Cauchy case fills some lacuna even in the closely related context of Directionally Reinforced Random Walks (DRRWs) for which it has not been considered yet. Besides, we need to introduced some relevant generalized drift -- extended the classical one -- in order to study the critical case but also the situation when the limit is no longer Markovian. It appears to be in full generality a drift in mean for the Persistent Random Walk (PRW). The limit processes keeping some memory -- given by some variable length Markov chain -- of the underlying PRW are called arcsine Lamperti anomalous diffusions due to their marginal distribution which are computed explicitly here. To this end, we make the connection with the governing equations for L{é}vy walks, the occupation times of skew Bessel processes and a more general class modelled on Lamperti processes. We also stress that we clarify some misunderstanding regarding this marginal distribution in the framework of DRRWs. Finally, we stress that the latter situation is more flexible -- as in the first paper -- in the sense that the results can be easily generalized to a wider class of PRWs without renewal pattern.

preprint2015arXiv

Online estimation of the geometric median in Hilbert spaces : non asymptotic confidence balls

Estimation procedures based on recursive algorithms are interesting and powerful techniques that are able to deal rapidly with (very) large samples of high dimensional data. The collected data may be contaminated by noise so that robust location indicators, such as the geometric median, may be preferred to the mean. In this context, an estimator of the geometric median based on a fast and efficient averaged non linear stochastic gradient algorithm has been developed by Cardot, Cénac and Zitt (2013). This work aims at studying more precisely the non asymptotic behavior of this algorithm by giving non asymptotic confidence balls. This new result is based on the derivation of improved $L^2$ rates of convergence as well as an exponential inequality for the martingale terms of the recursive non linear Robbins-Monro algorithm.

preprint2015arXiv

Persistent random walks

We consider a walker that at each step keeps the same direction with a probabilitythat depends on the time already spent in the direction the walker is currently moving. In this paper, we study some asymptotic properties of this persistent random walk and give the conditions of recurrence or transience in terms of "transition" probabilities to keep on the same direction or to change, without assuming that the latter admits any stationary probability. Examples are exhibited when this process is recurrent even if the random walk is not symmetric.

preprint2012arXiv

Almost sure central limit theorems for random ratios and applications to LSE for fractional Ornstein-Uhlenbeck processes

We investigate an almost sure limit theorem (ASCLT) for sequences of random variables having the form of a ratio of two terms such that the numerator satisfies the ASCLT and the denominator is a positive term which converges almost surely to 1. This result leads to the ASCLT for least square estimators for Ornstein-Uhlenbeck process driven by fractional Brownian motion.

preprint2012arXiv

Persistent random walks, variable length Markov chains and piecewise deterministic Markov processes

A classical random walk $(S_t, t\in\mathbb{N})$ is defined by $S_t:=\displaystyle\sum_{n=0}^t X_n$, where $(X_n)$ are i.i.d. When the increments $(X_n)_{n\in\mathbb{N}}$ are a one-order Markov chain, a short memory is introduced in the dynamics of $(S_t)$. This so-called "persistent" random walk is nolonger Markovian and, under suitable conditions, the rescaled process converges towards the integrated telegraph noise (ITN) as the time-scale and space-scale parameters tend to zero (see Herrmann and Vallois, 2010; Tapiero-Vallois, Tapiero-Vallois2}). The ITN process is effectively non-Markovian too. The aim is to consider persistent random walks $(S_t)$ whose increments are Markov chains with variable order which can be infinite. This variable memory is enlighted by a one-to-one correspondence between $(X_n)$ and a suitable Variable Length Markov Chain (VLMC), since for a VLMC the dependency from the past can be unbounded. The key fact is to consider the non Markovian letter process $(X_n)$ as the margin of a couple $(X_n,M_n)_{n\ge 0}$ where $(M_n)_{n\ge 0}$ stands for the memory of the process $(X_n)$. We prove that, under a suitable rescaling, $(S_n,X_n,M_n)$ converges in distribution towards a time continuous process $(S^0(t),X(t),M(t))$. The process $(S^0(t))$ is a semi-Markov and Piecewise Deterministic Markov Process whose paths are piecewise linear.

preprint2012arXiv

Recursive estimation of the conditional geometric median in Hilbert spaces

A recursive estimator of the conditional geometric median in Hilbert spaces is studied. It is based on a stochastic gradient algorithm whose aim is to minimize a weighted L1 criterion and is consequently well adapted for robust online estimation. The weights are controlled by a kernel function and an associated bandwidth. Almost sure convergence and L2 rates of convergence are proved under general conditions on the conditional distribution as well as the sequence of descent steps of the algorithm and the sequence of bandwidths. Asymptotic normality is also proved for the averaged version of the algorithm with an optimal rate of convergence. A simulation study confirms the interest of this new and fast algorithm when the sample sizes are large. Finally, the ability of these recursive algorithms to deal with very high-dimensional data is illustrated on the robust estimation of television audience profiles conditional on the total time spent watching television over a period of 24 hours.

preprint2011arXiv

A fast and recursive algorithm for clustering large datasets with $k$-medians

Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which are known to have better performances, and a data-driven procedure that allows automatic selection of the value of the descent step is proposed. The performance of the averaged sequential estimator is compared on a simulation study, both in terms of computation speed and accuracy of the estimations, with more classical partitioning techniques such as $k$-means, trimmed $k$-means and PAM (partitioning around medoids). Finally, this new online clustering technique is illustrated on determining television audience profiles with a sample of more than 5000 individual television audiences measured every minute over a period of 24 hours.

preprint2011arXiv

Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm

With the progress of measurement apparatus and the development of automatic sensors it is not unusual anymore to get thousands of samples of observations taking values in high dimension spaces such as functional spaces. In such large samples of high dimensional data, outlying curves may not be uncommon and even a few individuals may corrupt simple statistical indicators such as the mean trajectory. We focus here on the estimation of the geometric median which is a direct generalization of the real median and has nice robustness properties. The geometric median being defined as the minimizer of a simple convex functional that is differentiable everywhere when the distribution has no atoms, it is possible to estimate it with online gradient algorithms. Such algorithms are very fast and can deal with large samples. Furthermore they also can be simply updated when the data arrive sequentially. We state the almost sure consistency and the L2 rates of convergence of the stochastic gradient estimator as well as the asymptotic normality of its averaged version. We get that the asymptotic distribution of the averaged version of the algorithm is the same as the classic estimators which are based on the minimization of the empirical loss function. The performances of our averaged sequential estimator, both in terms of computation speed and accuracy of the estimations, are evaluated with a small simulation study. Our approach is also illustrated on a sample of more 5000 individual television audiences measured every second over a period of 24 hours.

preprint2011arXiv

Uncommon Suffix Tries

Common assumptions on the source producing the words inserted in a suffix trie with $n$ leaves lead to a $\log n$ height and saturation level. We provide an example of a suffix trie whose height increases faster than a power of $n$ and another one whose saturation level is negligible with respect to $\log n$. Both are built from VLMC (Variable Length Markov Chain) probabilistic sources; they are easily extended to families of sources having the same properties. The first example corresponds to a "logarithmic infinite comb" and enjoys a non uniform polynomial mixing. The second one corresponds to a "factorial infinite comb" for which mixing is uniform and exponential.

preprint2010arXiv

Variable length Markov chains and dynamical sources

Infinite random sequences of letters can be viewed as stochastic chains or as strings produced by a source, in the sense of information theory. The relationship between Variable Length Markov Chains (VLMC) and probabilistic dynamical sources is studied. We establish a probabilistic frame for context trees and VLMC and we prove that any VLMC is a dynamical source for which we explicitly build the mapping. On two examples, the ``comb'' and the ``bamboo blossom'', we find a necessary and sufficient condition for the existence and the unicity of a stationary probability measure for the VLMC. These two examples are detailed in order to provide the associated Dirichlet series as well as the generating functions of word occurrences.

preprint2006arXiv

Digital search trees and chaos game representation

In this paper, we consider a possible representation of a DNA sequence in a quaternary tree, in which on can visualize repetitions of subwords. The CGR-tree turns a sequence of letters into a digital search tree (DST), obtained from the suffixes of the reversed sequence. Several results are known concerning the height and the insertion depth for DST built from i.i.d. successive sequences. Here, the successive inserted wors are strongly dependent. We give the asymptotic behaviour of the insertion depth and of the length of branches for the CGR-tree obtained from the suffixes of reversed i.i.d. or Markovian sequence. This behaviour turns out to be at first order the same one as in the case of independent words. As a by-product, asymptotic results on the length of longest runs in a Markovian sequence are obtained.

Peggy Cénac

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

An efficient Averaged Stochastic Gauss-Newton algorithm for estimating parameters of non linear regressions models

Variable Length Memory Chains: characterization of stationary probability measures

Persistent random walks. II. Functional Scaling Limits

Online estimation of the geometric median in Hilbert spaces : non asymptotic confidence balls

Persistent random walks

Almost sure central limit theorems for random ratios and applications to LSE for fractional Ornstein-Uhlenbeck processes

Persistent random walks, variable length Markov chains and piecewise deterministic Markov processes

Recursive estimation of the conditional geometric median in Hilbert spaces

A fast and recursive algorithm for clustering large datasets with $k$-medians

Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm

Uncommon Suffix Tries

Variable length Markov chains and dynamical sources

Digital search trees and chaos game representation