Source author record

Richard Arratia

Richard Arratia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

16works
5topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2016arXiv

A countdown process, with application to the rank of random matrices over $\mathbb F_q(n)$

Motivated by the work of Fulman and Goldstein, comparing the distribution of the corank of random matrices in $\mathbb F_q[n]$ with the limit distribution as $n \to \infty$, we define a countdown process, driven by independent geometric random variables related to random integer partitions. Analysis of this process leads to sharper bounds on the total variation distance.

preprint2016arXiv

Completely effective error bounds for Stirling Numbers of the first and second kind via Poisson Approximation

We provide completely effective error estimates for Stirling numbers of the first and second kind, denoted by $s(n,m)$ and $S(n,m)$, respectively. These bounds are useful for values of $m \geq n - O(\sqrt{n})$. An application of our Theorem 5 yields, for example, \[ s(10^{12},\ 10^{12}-2\times 10^6)/10^{35664464} \in [ 1.87669, 1.876982 ], \] \[ S(10^{12},\ 10^{12}-2\times 10^6)/10^{35664463} \in [ 1.30121, 1.306975 ]. \] The bounds are obtained via Chen-Stein Poisson approximation, using an interpretation of Stirling numbers as the number of ways of placing non-attacking rooks on a chess board. As a corollary to Theorem 5, summarized in Proposition 1, we obtain two simple and explicit asymptotic formulas, one for each of $s(n,m)$ and $S(n,m)$, for the parametrization $m = n - t\, n^a$, $0 \leq a \leq \frac{1}{2}.$ These asymptotic formulas agree with the ones originally observed by Moser and Wyman in the range $0<a<\frac{1}{2}$, and they connect with a recent asymptotic expansion by Louchard for $\frac{1}{2}<a < 1$, hence filling the gap at $a = \frac{1}{2}$. We also provide a generalization applicable to rook and file numbers.

preprint2016arXiv

On the central role of the scale invariant Poisson processes on (0,infty)

The scale invariant Poisson processes on (0,infty) play a central but mildly disguised role in number theory, combinatorics, and genetics. They give the continuous limits which underly and unify diverse discrete structures, including the prime factorization of a uniformly chosen integer, the factorization of polynomials over finite fields, the decomposition into cycles of random permutations, the decomposition into components of random mappings, and the Ewens sampling formula. They deserve attention as one of the fundamental and central objects of probability theory.

preprint2016arXiv

Poisson and independent process approximation for random combinatorial structures with a given number of components, and near-universal behavior for low rank assemblies

We give a general framework for approximations to combinatorial assemblies, especially suitable to the situation where the number $k$ of components is specified, in addition to the overall size $n$. This involves a Poisson process, which, with the appropriate choice of parameter, may be viewed as an extension of saddlepoint approximation. We illustrate the use of this by analyzing the component structure when the rank and size are specified, and the rank, $r := n-k$, is small relative to $n$. There is near-universal behavior, in the sense that apart from cases where the exponential generating function has radius of convergence zero, for $\ell=1,2,\dots$, when $r \asymp n^α$ for fixed $α\in (\frac{\ell}{\ell+1}, \frac{\ell+1}{\ell+2})$, the size $L_1$ of the largest component converges in probabiity to $\ell+2$. Further, when $r \sim t\, n^{\ell/(\ell+1)}$ for a positive integer $\ell$, and $t \in (0,\infty)$, $\mathbb{P}\,(L_1 \in \{\ell+1,\ell+2\}) \to 1$, with the choice governed by a Poisson limit distribution for the number of components of size $\ell+2$. This was previously observed, for the case $\ell=1$ and the special cases of permutations and set partitions, using Chen-Stein approximations for the indicators of attacks and alignments, when rooks are placed randomly on a triangular board. The case $\ell=1$ is especially delicate, and was not handled by previous saddlepoint approximations.

preprint2015arXiv

Probabilistic divide-and-conquer: a new exact simulation method, with integer partitions as an example

We propose a new method, probabilistic divide-and-conquer, for improving the success probability in rejection sampling. For the example of integer partitions, there is an ideal recursive scheme which improves the rejection cost from asymptotically order $n^{3/4}$ to a constant. We show other examples for which a non--recursive, one--time application of probabilistic divide-and-conquer removes a substantial fraction of the rejection sampling cost. We also present a variation of probabilistic divide-and-conquer for generating i.i.d. samples that exploits features of the coupon collector's problem, in order to obtain a cost that is sublinear in the number of samples.

preprint2015arXiv

Some people have all the luck

We look at the Florida Lottery records of winners of prizes worth $600 or more. Some individuals claimed large numbers of prizes. Were they lucky, or up to something? We distinguish the "plausibly lucky" from the "implausibly lucky" by solving optimization problems that take into account the particular games each gambler won, where plausibility is determined by finding the minimum expenditure so that if every Florida resident spent that much, the chance that any of them would win as often as the gambler did would still be less than one in a million. Dealing with dependent bets relies on the BKR inequality; solving the optimization problem numerically relies on the log-concavity of the regularized Beta function. Subsequent investigation by law enforcement confirmed that the gamblers we identified as "implausibly lucky" were indeed behaving illegally.

preprint2014arXiv

A Simple Direct Proof of Billingsley's Theorem

Billingsley's theorem (1972) asserts that the Poisson--Dirichlet process is the limit, as $n \to \infty$, of the process giving the relative log sizes of the largest prime factor, the second largest, and so on, of a random integer chosen uniformly from 1 to $n$. In this paper we give a new proof that directly exploits Dickman's asymptotic formula for the number of such integers with no prime factor larger than $n^{1/u}$, namely $Ψ(n,n^{1/u}) \sim n ρ(u)$, to derive the limiting joint density functions of the finite-dimensional projections of the log prime factor processes. Our main technical tool is a new criterion for the convergence in distribution of non-lattice discrete random variables to continuous random variables.

preprint2014arXiv

Bounded size bias coupling: a Gamma function bound, and universal Dickman-function behavior

Under the assumption that the distribution of a nonnegative random variable $X$ admits a bounded coupling with its size biased version, we prove simple and strong concentration bounds. In particular the upper tail probability is shown to decay at least as fast as the reciprocal of a Gamma function, guaranteeing a moment generating function that converges everywhere. The class of infinitely divisible distributions with finite mean, whose Lévy measure is supported on an interval contained in $[0,c]$ for some $c < \infty$, forms a special case in which this upper bound is logarithmically sharp. In particular the asymptotic estimate for the Dickman function, that $ρ(u) \approx u^{-u}$ for large $u$, is shown to be universal for this class. A special case of our bounds arises when $X$ is a sum of independent random variables, each admitting a 1-bounded size bias coupling. In this case, our bounds are comparable to Chernoff--Hoeffding bounds; however, ours are broader in scope, sharper for the upper tail, and equal for the lower tail. We discuss \emph{bounded} and \emph{monotone} couplings, give a sandwich principle, and show how this gives an easy conceptual proof that any finite positive mean sum of independent Bernoulli random variables admits a 1-bounded coupling with the same conditioned to be nonzero.

preprint2014arXiv

Extensions of Billingsley's Theorem via Multi-Intensities

Let $p_1 \ge p_2 \ge \dots$ be the prime factors of a random integer chosen uniformly from $1$ to $n$, and let $$ \frac{\log p_1}{\log n}, \frac{\log p_2}{\log n}, \dots $$ be the sequence of scaled log factors. Billingsley's Theorem (1972), in its modern formulation, asserts that the limiting process, as $n \to \infty$, is the Poisson-Dirichlet process with parameter $θ=1$. In this paper we give a new proof, inspired by the 1993 proof by Donnelly and Grimmett, and extend the result to factorizations of elements of normed arithmetic semigroups satisfying certain growth conditions, for which the limiting Poisson-Dirichlet process need not have $θ=1$. We also establish Poisson-Dirichlet limits, with $θ\ne 1$, for ordinary integers conditional on the number of prime factors deviating from the usual value $\log \log n$. At the core of our argument is a purely probabilistic lemma giving a new criterion for convergence in distribution to a Poisson-Dirichlet process, from which the number-theoretic applications follow as straightforward corollaries. The lemma uses ingredients similar to those employed by Donnelly and Grimmett, but reorganized so as to allow subsequent number theory input to be processed as rapidly as possible. A by-product of this work is a new characterization of Poisson-Dirichlet processes in terms of multi-intensities.

preprint2014arXiv

Poisson--Dirichlet Limit Theorems in Combinatorial Applications via Multi-Intensities

We present new, exceptionally efficient proofs of Poisson--Dirichlet limit theorems for the scaled sizes of irreducible components of random elements in the classic combinatorial contexts of arbitrary assemblies, multisets, and selections, when the components generating functions satisfy certain standard hypotheses. The proofs exploit a new criterion for Poisson--Dirichlet limits, originally designed for rapid proofs of Billingsley's theorem on the scaled sizes of log prime factors of random integers (and some new generalizations). Unexpectedly, the technique applies in the present combinatorial setting as well, giving, perhaps, a long sought-after unifying point of view. The proofs depend also on formulas of Arratia and Tavar{é} for the mixed moments of counts of components of various sizes, as well as formulas of Flajolet and Soria for the asymptotics of generating function coefficients.

preprint2014arXiv

Scale-free and power law distributions via fixed points and convergence of (thinning and conditioning) transformations

In discrete contexts such as the degree distribution for a graph, \emph{scale-free} has traditionally been \emph{defined} to be \emph{power-law}. We propose a reasonable interpretation of \emph{scale-free}, namely, invariance under the transformation of $p$-thinning, followed by conditioning on being positive. For each $β\in (1,2)$, we show that there is a unique distribution which is a fixed point of this transformation; the distribution is power-law-$β$, and different from the usual Yule--Simon power law-$β$ that arises in preferential attachment models. In addition to characterizing these fixed points, we prove convergence results for iterates of the transformation.

preprint2013arXiv

Independent Process Approximations for Random Combinatorial Structures

Many random combinatorial objects have a component structure whose joint distribution is equal to that of a process of mutually independent random variables, conditioned on the value of a weighted sum of the variables. It is interesting to compare the combinatorial structure directly to the independent discrete process, without renormalizing. The quality of approximation can often be conveniently quantified in terms of total variation distance, for functionals which observe part, but not all, of the combinatorial and independent processes. Among the examples are combinatorial assemblies (e.g. permutations, random mapping functions, and partitions of a set), multisets (e.g. polynomials over a finite field, mapping patterns and partitions of an integer), and selections (e.g. partitions of an integer into distinct parts, and square-free polynomials over finite fields). We consider issues common to all the above examples, including equalities and upper bounds for total variation distances, existence of limiting processes, heuristics for good approximations, the relation to standard generating functions, moment formulas and recursions for computing densities, refinement to the process which counts the number of parts of each possible type, the effect of further conditioning on events of moderate probability, large deviation theory and nonuniform measures on combinatorial objects, and the possibility of getting useful results by overpowering the conditioning.

preprint2013arXiv

On the Amount of Dependence in the Prime Factorization of a Uniform Random Integer

How much dependence is there in the prime factorization of a random integer distributed uniformly from 1 to n? How much dependence is there in the decomposition into cycles of a random permutation of n points? What is the relation between the Poisson-Dirichlet process and the scale invariant Poisson process? These three questions have essentially the same answers, with respect to total variation distance, considering only small components, and with respect to a Wasserstein distance, considering all components. The Wasserstein distance is the expected number of changes -- insertions and deletions -- needed to change the dependent system into an independent system. In particular we show that for primes, roughly speaking, 2+o(1) changes are necessary and sufficient to convert a uniformly distributed random integer from 1 to n into a random integer prod_{p leq n} p^{Z_p} in which the multiplicity Z_p of the factor p is geometrically distributed, with all Z_p independent. The changes are, with probability tending to 1, one deletion, together with a random number of insertions, having expectation 1+o(1). The crucial tool for showing that 2+epsilon suffices is a coupling of the infinite independent model of prime multiplicities, with the scale invariant Poisson process on (0,infty). A corollary of this construction is the first metric bound on the distance to the Poisson-Dirichlet in Billingsley's 1972 weak convergence result. Our bound takes the form: there are couplings in which E sum |log P_i(n) - (log n) V_i | = O(\log \log n), where P_i denotes the i-th largest prime factor and V_i denotes the i-th component of the Poisson-Dirichlet process. It is reasonable to conjecture that O(1) is achievable.

preprint2013arXiv

On the Random Sampling of Pairs, with Pedestrian examples

Suppose one desires to randomly sample a pair of objects such as socks, hoping to get a matching pair. Even in the simplest situation for sampling, which is sampling with replacement, the innocent phrase "the distribution of the color of a matching pair" is ambiguous. One interpretation is that we condition on the event of getting a match between two random socks; this corresponds to sampling two at a time, over and over without memory, until a matching pair is found. A second interpretation is to sample sequentially, one at a time, with memory, until the same color has been seen twice. We study the difference between these two methods. The input is a discrete probability distribution on colors, describing what happens when one sock is sampled. There are two derived distributions --- the pair-color distributions under the two methods of getting a match. The output, a number we call the discrepancy of the input distribution, is the total variation distance between the two derived distributions. It is easy to determine when the two pair-color distributions come out equal, that is, to determine which distributions have discrepancy zero, but hard to determine the largest possible discrepancy. We find the exact extreme for the case of two colors, by analyzing the roots of a fifth degree polynomial in one variable. We find the exact extreme for the case of three colors, by analyzing the 49 roots of a variety spanned by two seventh-degree polynomials in two variables. We give a plausible conjecture for the general situation of a finite number of colors, and give an exact computation of a constant which is a plausible candidate for the supremum of the discrepancy over all discrete probability distributions. We briefly consider the more difficult case where the objects to be matched into pairs are of two different kinds, such as male-female or left-right.

preprint2012arXiv

On the singularity of random Bernoulli matrices - novel integer partitions and lower bound expansions

We prove a lower bound expansion on the probability that a random $\pm 1$ matrix is singular, and conjecture that such expansions govern the actual probability of singularity. These expansions are based on naming the most likely, second most likely, and so on, ways that a Bernoulli matrix can be singular; the most likely way is to have a null vector of the form $e_i \pm e_j$, which corresponds to the integer partition 11, with two parts of size 1. The second most likely way is to have a null vector of the form $e_i \pm e_j \pm e_k \pm e_\ell$, which corresponds to the partition 1111. The fifth most likely way corresponds to the partition 21111. We define and characterize the "novel partitions" which show up in this series. As a family, novel partitions suffice to detect singularity, i.e., any singular Bernoulli matrix has a left null vector whose underlying integer partition is novel. And, with respect to this property, the family of novel partitions is minimal. We prove that the only novel partitions with six or fewer parts are 11, 1111, 21111, 111111, 221111, 311111, and 322111. We prove that there are fourteen novel partitions having seven parts. We formulate a conjecture about which partitions are "first place and runners up," in relation to the Erdős-Littlewood-Offord bound. We prove some bounds on the interaction between left and right null vectors.

preprint2010arXiv

Size bias, sampling, the waiting time paradox, and infinite divisibility: when is the increment independent?

With $X^*$ denoting a random variable with the $X$-size bias distribution, what are all distributions for $X$ such that it is possible to have $X^*=X+Y$, $Y\geq 0$, with $X$ and $Y$ {\em independent}? We give the answer, due to Steutel \cite{steutel}, and also discuss the relations of size biasing to the waiting time paradox, renewal theory, sampling, tightness and uniform integrability, compound Poisson distributions, infinite divisibility, and the lognormal distributions.