Source author record

Bernard Ycart

Bernard Ycart appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Populations and Evolution math.HO Quantitative Methods Computation and Language Genomics Methodology Molecular Networks Applications Cryptography and Security

Catalog footprint

What is connected

21works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Computing wedge probabilities

A new formula for the probability that a standard Brownian motion stays between two linear boundaries is proved. A simple algorithm is deduced. Uniform precision estimates are computed. Different implementations have been made available online as R packages.

preprint2015arXiv

Gärtner-Ellis condition for squared asymptotically stationary Gaussian processes

The Gärtner-Ellis condition for the square of an asymptotically stationary Gaussian process is established. The same limit holds for the conditional distri-bution given any fixed initial point, which entails weak multiplicative ergodicity. The limit is shown to be the Laplace transform of a convolution of Gamma distributions with Poisson compound of exponentials. A proof based on Wiener-Hopf factorization induces a probabilistic interpretation of the limit in terms of a regression problem.

preprint2014arXiv

1827 : la mode de la statistique en France; origine, extension, personnages

Independent to a great extent from the scientific development of the discipline, a trend for statistics has developed in France, from 1827 on. It was probably sparked by Charles Dupin's 'Carte figurative de l'instruction populaire', with its famous Saint-Malo Geneva line, supposed to separate the educated North from the ignorant South. It became attractive to produce, under the name 'statistics', more or less quantitative descriptions on any subject. Beyond literary records, the phenomenon can be measured by its semantic penetration in the press. Even if the ambition of most of these amateurs has remained strictly descriptive, some of them did raise the issue of proving through numbers. This is particularly remarkable, since within institutional science, the techniques of statistical proving, that had been introduced by Laplace at the end of the 18th century, have remained largely ignored for a very long time.

preprint2014arXiv

Jakob Bielfeld (1717--1770) and the diffusion of statistical concepts in eighteenth century Europe

Published between 1760 and 1770, Bielfeld's writings prove that scholars of the time were acquainted with the concepts of both political arithmetic and German statistik, long before they merged into a new discipline at the beginning the following century. It is argued here that these works may have been an important source of diffusion of statistical concepts at the end of the eighteenth century. Bielfeld is now almost completely forgotten, and the reasons for his lack of fame in posterity are examined.

preprint2014arXiv

Large scale statistical analysis of GEO datasets

The problem addressed here is that of simultaneous treatment of several gene expression datasets, possibly collected under different experimental conditions and/or platforms. Using robust statistics, a large scale statistical analysis has been conducted over $20$ datasets downloaded from the Gene Expression Omnibus repository. The differences between datasets are compared to the variability inside a given dataset. Evidence that meaningful biological information can be extracted by merging different sources is provided.

preprint2014arXiv

Simultaneous growth of two cancer cell lines evidences variability in growth rates

Cancer cells co-cultured in vitro reveal unexpected differential growth rates that classical exponential growth models cannot account for. Two non-interacting cell lines were grown in the same culture, and counts of each species were recorded at periodic times. The relative growth of population ratios was found to depend on the initial proportion, in contradiction with the traditional exponential growth model. The proposed explanation is the variability of growth rates for clones inside the same cell line. This leads to a log-quadratic growth model that provides both a theoretical explanation to the phenomenon that was observed, and a better fit to our growth data.

preprint2014arXiv

Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis

Gene Set Enrichment Analysis (GSEA) is a basic tool for genomic data treatment. From a statistical point of view, the centering of its test statistic does not allow the derivation of asymptotic results. A test statistic with a different centering is proposed. Under the null hypothesis, the convergence in distribution of the new test statistic is proved, using the theory of empirical processes. The limiting distribution can be computed by Monte-Carlo simulation. The test defined in this way has been called Weighted Kolmogorov Smirnov (WKS) test. The fact that the evaluation of the asymptotic distribution serves for many different gene sets results in shorter computing times. Using expression data from the GEO repository, tested against the MSig Database C2, a comparison between the classical GSEA test and the new procedure has been conducted. Our conclusion is that, beyond its mathematical and algorithmic advantages, the WKS test could be more informative in many cases, than the classical GSEA test.

preprint2013arXiv

Bounds for left and right window cutoffs

The location and width of the time window in which a sequence of processes converges to equilibrum are given under conditions of exponential convergence. The location depends on the side: the left-window and right window cutoffs may have different locations. Bounds on the distance to equilibrium are given for both sides. Examples prove that the bounds are tight.

preprint2013arXiv

Exponential growth of bifurcating processes with ancestral dependence

Branching processes are classical growth models in cell kinetics. In their construction, it is usually assumed that cell lifetimes are independent random variables, which has been proved false in experiments. Models of dependent lifetimes are considered here, in particular bifurcating Markov chains. Under hypotheses of stationarity and multiplicative ergodicity, the corresponding branching process is proved to have the same type of asymptotics as its classic counterpart in the i.i.d. supercritical case: the cell population grows exponentially, the growth rate being related to the exponent of multiplicative ergodicity, in a similar way as to the Laplace transform of lifetimes in the i.i.d. case. An identifiable model for which the multiplicative ergodicity coefficients and the growth rate can be explicitly computed is proposed.

preprint2013arXiv

Exponential transform of quadratic functional and multiplicative ergodicity of a Gauss-Markov process

The Laplace transform of partial sums of the square of a non-centered Gauss-Markov process, conditioning on its starting point, is explicitly computed. The parameters of multiplicative ergodicity are deduced.

preprint2013arXiv

Fluctuation analysis with cell deaths

The classical Luria-Delbrück model for fluctuation analysis is extended to the case where cells can either divide or die at the end of their generation time. This leads to a family of probability distributions generalizing the Luria-Delbrück family, and depending on three parameters: the expected number of mutations, the relative fitness of normal cells compared to mutants, and the death probability of mutants. The probabilistic treatment is similar to that of the classical case; simulation and computing algorithms are provided. The estimation problem is discussed: if the death probability is known, the two other parameters can be reliably estimated. If the death probability is unknown, the model can be identified only for large samples.

preprint2013arXiv

Fluctuation analysis: can estimates be trusted?

The estimation of mutation probabilities and relative fitnesses in fluctuation analysis is based on the unrealistic hypothesis that the single-cell times to division are exponentially distributed. Using the classical Luria-Delbrück distribution outside its modelling hypotheses induces an important bias on the estimation of the relative fitness. The model is extended here to any division time distribution. Mutant counts follow a generalization of the Luria-Delbrück distribution, which depends on the mean number of mutations, the relative fitness of normal cells compared to mutants, and the division time distribution of mutant cells. Empirical probability generating function techniques yield precise estimates both of the mean number of mutations and the relative fitness of normal cells compared to mutants. In the case where no information is available on the division time distribution, it is shown that the estimation procedure using constant division times yields more reliable results. Numerical results both on observed and simulated data are reported.

preprint2013arXiv

Simulation of Gene Regulatory Networks

This limited review is intended as an introduction to the fast growing subject of mathematical modelling of cell metabolism and its biochemical pathways, and more precisely on pathways linked to apoptosis of cancerous cells. Some basic mathematical models of chemical kinetics, with emphasis on stochastic models, are presented.

preprint2013arXiv

Some mathematical tools for the Lenski experiment

The Lenski experiment is a long term daily reproduction of Escherichia coli, that has evidenced phenotypic and genetic evolutions along the years. Some mathematical models, that could be usefull in understanding the results of that experiment, are reviewed here: stochastic and deterministic growth, mutation appearance and fixation, competition of species.

preprint2013arXiv

Statistical data mining for symbol associations in genomic databases

A methodology is proposed to automatically detect significant symbol associations in genomic databases. A new statistical test is proposed to assess the significance of a group of symbols when found in several genesets of a given database. Applied to symbol pairs, the thresholded p-values of the test define a graph structure on the set of symbols. The cliques of that graph are significant symbol associations, linked to a set of genesets where they can be found. The method can be applied to any database, and is illustrated MSigDB C2 database. Many of the symbol associations detected in C2 or in non-specific selections did correspond to already known interactions. On more specific selections of C2, many previously unkown symbol associations have been detected. These associations unveal new candidates for gene or protein interactions, needing further investigation for biological evidence.

preprint2012arXiv

A case of mathematical eponymy: the Vandermonde determinant

We study the historical process that led to the worldwide adoption, throughout mathematical research papers and textbooks, of the denomination "Vandermonde determinant". The mathematical object can be related to two passages in Vandermonde's writings, of which one inspired Cauchy's definition of determinants. Influential citations of Cauchy and Jacobi may have initiated the naming process. It started during the second half of the 19\textsuperscript{th} century as a pedagogical practice in France. The spread in textbooks and research journals began during the first half of 20\textsuperscript{th} century, and only reached full acceptance after the 1960's. The naming process is still ongoing, in the sense that the volume of publications using the denomination grows significantly faster than the overall volume of the field.

preprint2012arXiv

Alberti's letter counts

Four centuries before modern statistical linguistics was born, Leon Battista Alberti (1404--1472) compared the frequency of vowels in Latin poems and orations, making the first quantified observation of a stylistic difference ever. Using a corpus of 20 Latin texts (over 5 million letters), Alberti's observations are statistically assessed. Letter counts prove that poets used significantly more a's, e's, and y's, whereas orators used more of the other vowels. The sample sizes needed to justify the assertions are studied, and proved to be within reach for Alberti's scholarship.

preprint2012arXiv

Letter counting: a stem cell for Cryptology, Quantitative Linguistics, and Statistics

Counting letters in written texts is a very ancient practice. It has accompanied the development of Cryptology, Quantitative Linguistics, and Statistics. In Cryptology, counting frequencies of the different characters in an encrypted message is the basis of the so called frequency analysis method. In Quantitative Linguistics, the proportion of vowels to consonants in different languages was studied long before authorship attribution. In Statistics, the alternation vowel-consonants was the only example that Markov ever gave of his theory of chained events. A short history of letter counting is presented. The three domains, Cryptology, Quantitative Linguistics, and Statistics, are then examined, focusing on the interactions with the other two fields through letter counting. As a conclusion, the eclectism of past centuries scholars, their background in humanities, and their familiarity with cryptograms, are identified as contributing factors to the mutual enrichment process which is described here.

preprint2012arXiv

Statistics for the Luria-Delbrück distribution

The Luria-Delbrück distribution is a classical model of mutations in cell kinetics. It is obtained as a limit when the probability of mutation tends to zero and the number of divisions to infinity. It can be interpreted as a compound Poisson distribution (for the number of mutations) of exponential mixtures (for the developing time of mutant clones) of geometric distributions (for the number of cells produced by a mutant clone in a given time). The probabilistic interpretation, and a rigourous proof of convergence in the general case, are deduced from classical results on Bellman-Harris branching processes. The two parameters of the Luria-Delbrück distribution are the expected number of mutations, which is the parameter of interest, and the relative fitness of normal cells compared to mutants, which is the heavy tail exponent. Both can be simultaneously estimated by the maximum likehood method. However, the computation becomes numerically unstable as soon as the maximal value of the sample is large, which occurs frequently due to the heavy tail property. Based on the empirical generating function, robust estimators are proposed and their asymptotic variance is given. They are comparable in precision to maximum likelihood estimators, with a much broader range of calculability, a better numerical stability, and a negligible computing time.

preprint2006arXiv

A zero-one law for first-order logic on random images

For an $n\times n$ random image with independent pixels, black with probability $p(n)$ and white with probability $1-p(n)$, the probability of satisfying any given first-order sentence tends to 0 or 1, provided both $p(n)n^{\frac{2}{k}}$ and $(1-p(n))n^{\frac{2}{k}}$ tend to 0 or $+\infty$, for any integer $k$. The result is proved by computing the threshold function for basic local sentences, and applying Gaifman's theorem.

preprint2006arXiv

Image denoising by statistical area thresholding

Area openings and closings are morphological filters which efficiently suppress impulse noise from an image, by removing small connected components of level sets. The problem of an objective choice of threshold for the area remains open. Here, a mathematical model for random images will be considered. Under this model, a Poisson approximation for the probability of appearance of any local pattern can be computed. In particular, the probability of observing a component with size larger than $k$ in pure impulse noise has an explicit form. This permits the definition of a statistical test on the significance of connected components, thus providing an explicit formula for the area threshold of the denoising filter, as a function of the impulse noise probability parameter. Finally, using threshold decomposition, a denoising algorithm for grey level images is proposed.

Bernard Ycart

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Computing wedge probabilities

Gärtner-Ellis condition for squared asymptotically stationary Gaussian processes

1827 : la mode de la statistique en France; origine, extension, personnages

Jakob Bielfeld (1717--1770) and the diffusion of statistical concepts in eighteenth century Europe

Large scale statistical analysis of GEO datasets

Simultaneous growth of two cancer cell lines evidences variability in growth rates

Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis

Bounds for left and right window cutoffs

Exponential growth of bifurcating processes with ancestral dependence

Exponential transform of quadratic functional and multiplicative ergodicity of a Gauss-Markov process

Fluctuation analysis with cell deaths

Fluctuation analysis: can estimates be trusted?

Simulation of Gene Regulatory Networks

Some mathematical tools for the Lenski experiment

Statistical data mining for symbol associations in genomic databases

A case of mathematical eponymy: the Vandermonde determinant

Alberti's letter counts

Letter counting: a stem cell for Cryptology, Quantitative Linguistics, and Statistics

Statistics for the Luria-Delbrück distribution

A zero-one law for first-order logic on random images

Image denoising by statistical area thresholding