Source author record

Stephen DeSalvo

Stephen DeSalvo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

13works
4topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2016arXiv

A robust quantitative local central limit theorem with applications to enumerative combinatorics and random combinatorial structures

A useful heuristic in the understanding of large random combinatorial structures is the Arratia-Tavare principle, which describes an approximation to the joint distribution of component-sizes using independent random variables. The principle outlines conditions under which the total variation distance between the true joint distribution and the approximation should be small, and was successfully exploited by Pittel in the cases of integer partitions and set partitions. We provide sufficient conditions for this principle to be true in a general context, valid for certain discrete probability distributions which are $\textit{perturbed log-concave}$, via a quantitative local central limit theorem. We then use it to generalize some classical asymptotic statistics in combinatorial theory, as well as assert some new ones.

preprint2016arXiv

Completely effective error bounds for Stirling Numbers of the first and second kind via Poisson Approximation

We provide completely effective error estimates for Stirling numbers of the first and second kind, denoted by $s(n,m)$ and $S(n,m)$, respectively. These bounds are useful for values of $m \geq n - O(\sqrt{n})$. An application of our Theorem 5 yields, for example, \[ s(10^{12},\ 10^{12}-2\times 10^6)/10^{35664464} \in [ 1.87669, 1.876982 ], \] \[ S(10^{12},\ 10^{12}-2\times 10^6)/10^{35664463} \in [ 1.30121, 1.306975 ]. \] The bounds are obtained via Chen-Stein Poisson approximation, using an interpretation of Stirling numbers as the number of ways of placing non-attacking rooks on a chess board. As a corollary to Theorem 5, summarized in Proposition 1, we obtain two simple and explicit asymptotic formulas, one for each of $s(n,m)$ and $S(n,m)$, for the parametrization $m = n - t\, n^a$, $0 \leq a \leq \frac{1}{2}.$ These asymptotic formulas agree with the ones originally observed by Moser and Wyman in the range $0<a<\frac{1}{2}$, and they connect with a recent asymptotic expansion by Louchard for $\frac{1}{2}<a < 1$, hence filling the gap at $a = \frac{1}{2}$. We also provide a generalization applicable to rook and file numbers.

preprint2016arXiv

Exact sampling algorithms for Latin squares and Sudoku matrices via probabilistic divide-and-conquer

We provide several algorithms for the exact, uniform random sampling of Latin squares and Sudoku matrices via probabilistic divide-and-conquer (PDC). Our approach divides the sample space into smaller pieces, samples each separately, and combines them in a manner which yields an exact sample from the target distribution. We demonstrate, in particular, a version of PDC in which one of the pieces is sampled using a brute force approach, which we dub $\textit{almost deterministic second half}$, as it is a generalization to a previous application of PDC for which one of the pieces is uniquely determined given the others.

preprint2016arXiv

Improvements to exact Boltzmann sampling using probabilistic divide-and-conquer and the recursive method

We demonstrate an approach for exact sampling of certain discrete combinatorial distributions, which is a hybrid of exact Boltzmann sampling and the recursive method, using probabilistic divide-and-conquer (PDC). The approach specializes to exact Boltzmann sampling in the trivial setting, and specializes to PDC deterministic second half in the first non-trivial application. A large class of examples is given for which this method broadly applies, and several examples are worked out explicitly.

preprint2016arXiv

Poisson and independent process approximation for random combinatorial structures with a given number of components, and near-universal behavior for low rank assemblies

We give a general framework for approximations to combinatorial assemblies, especially suitable to the situation where the number $k$ of components is specified, in addition to the overall size $n$. This involves a Poisson process, which, with the appropriate choice of parameter, may be viewed as an extension of saddlepoint approximation. We illustrate the use of this by analyzing the component structure when the rank and size are specified, and the rank, $r := n-k$, is small relative to $n$. There is near-universal behavior, in the sense that apart from cases where the exponential generating function has radius of convergence zero, for $\ell=1,2,\dots$, when $r \asymp n^α$ for fixed $α\in (\frac{\ell}{\ell+1}, \frac{\ell+1}{\ell+2})$, the size $L_1$ of the largest component converges in probabiity to $\ell+2$. Further, when $r \sim t\, n^{\ell/(\ell+1)}$ for a positive integer $\ell$, and $t \in (0,\infty)$, $\mathbb{P}\,(L_1 \in \{\ell+1,\ell+2\}) \to 1$, with the choice governed by a Poisson limit distribution for the number of components of size $\ell+2$. This was previously observed, for the case $\ell=1$ and the special cases of permutations and set partitions, using Chen-Stein approximations for the indicators of attacks and alignments, when rooks are placed randomly on a triangular board. The case $\ell=1$ is especially delicate, and was not handled by previous saddlepoint approximations.

preprint2016arXiv

Probabilistic divide-and-conquer: deterministic second half

We present a probabilistic divide-and-conquer (PDC) method for \emph{exact} sampling of conditional distributions of the form $\mathcal{L}( {\bf X}\, |\, {\bf X} \in E)$, where ${\bf X}$ is a random variable on $\mathcal{X}$, a complete, separable metric space, and event $E$ with $\mathbb{P}(E) \geq 0$ is assumed to have sufficient regularity such that the conditional distribution exists and is unique up to almost sure equivalence. The PDC approach is to define a decomposition of $\mathcal{X}$ via sets $\mathcal{A}$ and $\mathcal{B}$ such that $\mathcal{X} = \mathcal{A} \times \mathcal{B}$, and sample from each separately. The deterministic second half approach is to select the sets $\mathcal{A}$ and $\mathcal{B}$ such that for each element $a\in \mathcal{A}$, there is only one element $b_a \in \mathcal{B}$ for which $(a,b_a)\in E$. We show how this simple approach provides non-trivial improvements to several conventional random sampling algorithms in combinatorics, and we demonstrate its versatility with applications to sampling from sufficiently regular conditional distributions.

preprint2016arXiv

The probability of avoiding consecutive patterns in the Mallows distribution

We use various combinatorial and probabilistic techniques to study growth rates for the probability that a random permutation from the Mallows distribution avoids consecutive patterns. The Mallows distribution behaves like a $q$-analogue of the uniform distribution by weighting each permutation $π$ by $q^{inv(π)}$, where $inv(π)$ is the number of inversions in $π$ and $q$ is a positive, real-valued parameter. We prove that the growth rate exists for all patterns and all $q>0$, and we generalize Goulden and Jackson's cluster method to keep track of the number of inversions in permutations avoiding a given consecutive pattern. Using singularity analysis, we approximate the growth rates for length-3 patterns, monotone patterns, and non-overlapping patterns starting with 1, and we compare growth rates between different patterns. We also use Stein's method to show that, under certain assumptions on $q$, the length of $σ$, and $inv(σ)$, the number of occurrences of a given pattern $σ$ is well approximated by the normal distribution.

preprint2015arXiv

An Independent Process Approximation to Sparse Random Graphs with a Prescribed Number of Edges and Triangles

We prove a $pre$-$asymptotic$ bound on the total variation distance between the uniform distribution over two types of undirected graphs with $n$ nodes. One distribution places a prescribed number of $k_T$ triangles and $k_S$ edges not involved in a triangle independently and uniformly over all possibilities, and the other is the uniform distribution over simple graphs with exactly $k_T$ triangles and $k_S$ edges not involved in a triangle. As a corollary, for $k_S = o(n)$ and $k_T = o(n)$ as $n$ tends to infinity, the total variation distance tends to $0$, at a rate that is given explicitly. Our main tool is Chen-Stein Poisson approximation, hence our bounds are explicit for all finite values of the parameters.

preprint2015arXiv

Probabilistic divide-and-conquer: a new exact simulation method, with integer partitions as an example

We propose a new method, probabilistic divide-and-conquer, for improving the success probability in rejection sampling. For the example of integer partitions, there is an ideal recursive scheme which improves the rejection cost from asymptotically order $n^{3/4}$ to a constant. We show other examples for which a non--recursive, one--time application of probabilistic divide-and-conquer removes a substantial fraction of the rejection sampling cost. We also present a variation of probabilistic divide-and-conquer for generating i.i.d. samples that exploits features of the coupon collector's problem, in order to obtain a cost that is sublinear in the number of samples.

preprint2013arXiv

On the Random Sampling of Pairs, with Pedestrian examples

Suppose one desires to randomly sample a pair of objects such as socks, hoping to get a matching pair. Even in the simplest situation for sampling, which is sampling with replacement, the innocent phrase "the distribution of the color of a matching pair" is ambiguous. One interpretation is that we condition on the event of getting a match between two random socks; this corresponds to sampling two at a time, over and over without memory, until a matching pair is found. A second interpretation is to sample sequentially, one at a time, with memory, until the same color has been seen twice. We study the difference between these two methods. The input is a discrete probability distribution on colors, describing what happens when one sock is sampled. There are two derived distributions --- the pair-color distributions under the two methods of getting a match. The output, a number we call the discrepancy of the input distribution, is the total variation distance between the two derived distributions. It is easy to determine when the two pair-color distributions come out equal, that is, to determine which distributions have discrepancy zero, but hard to determine the largest possible discrepancy. We find the exact extreme for the case of two colors, by analyzing the roots of a fifth degree polynomial in one variable. We find the exact extreme for the case of three colors, by analyzing the 49 roots of a variety spanned by two seventh-degree polynomials in two variables. We give a plausible conjecture for the general situation of a finite number of colors, and give an exact computation of a constant which is a plausible candidate for the supremum of the discrepancy over all discrete probability distributions. We briefly consider the more difficult case where the objects to be matched into pairs are of two different kinds, such as male-female or left-right.

preprint2012arXiv

On the singularity of random Bernoulli matrices - novel integer partitions and lower bound expansions

We prove a lower bound expansion on the probability that a random $\pm 1$ matrix is singular, and conjecture that such expansions govern the actual probability of singularity. These expansions are based on naming the most likely, second most likely, and so on, ways that a Bernoulli matrix can be singular; the most likely way is to have a null vector of the form $e_i \pm e_j$, which corresponds to the integer partition 11, with two parts of size 1. The second most likely way is to have a null vector of the form $e_i \pm e_j \pm e_k \pm e_\ell$, which corresponds to the partition 1111. The fifth most likely way corresponds to the partition 21111. We define and characterize the "novel partitions" which show up in this series. As a family, novel partitions suffice to detect singularity, i.e., any singular Bernoulli matrix has a left null vector whose underlying integer partition is novel. And, with respect to this property, the family of novel partitions is minimal. We prove that the only novel partitions with six or fewer parts are 11, 1111, 21111, 111111, 221111, 311111, and 322111. We prove that there are fourteen novel partitions having seven parts. We formulate a conjecture about which partitions are "first place and runners up," in relation to the Erdős-Littlewood-Offord bound. We prove some bounds on the interaction between left and right null vectors.