Source author record

Ralph Neininger

Ralph Neininger appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Data Structures and Algorithms Discrete Mathematics math.CO Applications math.ST Statistics Theory

Catalog footprint

What is connected

17works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

A note on the independence number, domination number and related parameters of random binary search trees and random recursive trees

We identify the mean growth of the independence number of random binary search trees and random recursive trees and show normal fluctuations around their means. Similarly we also show normal limit laws for the domination number and variations of it for these two cases of random tree models. Our results are an application of a recent general theorem of Holmgren and Janson on fringe trees in these two random tree models.

preprint2016arXiv

Dependence and phase changes in random $m$-ary search trees

We study the joint asymptotic behavior of the space requirement and the total path length (either summing over all root-key distances or over all root-node distances) in random $m$-ary search trees. The covariance turns out to exhibit a change of asymptotic behavior: it is essentially linear when $3\le m\le 13$ but becomes of higher order when $m\ge14$. Surprisingly, the corresponding asymptotic correlation coefficient tends to zero when $3\le m\le 26$ but is periodically oscillating for larger $m$. Such a less anticipated phenomenon is not exceptional and we extend the results in two directions: one for more general shape parameters, and the other for other classes of random log-trees such as fringe-balanced binary search trees and quadtrees. The methods of proof combine asymptotic transfer for the underlying recurrence relations with the contraction method.

preprint2015arXiv

A Limit Theorem for Radix Sort and Tries with Markovian Input

Tries are among the most versatile and widely used data structures on words. In particular, they are used in fundamental sorting algorithms such as radix sort which we study in this paper. While the performance of radix sort and tries under a realistic probabilistic model for the generation of words is of significant importance, its analysis, even for simplest memoryless sources, has proved difficult. In this paper we consider a more realistic model where words are generated by a Markov source. By a novel use of the contraction method combined with moment transfer techniques we prove a central limit theorem for the complexity of radix sort and for the external path length in a trie. This is the first application of the contraction method to the analysis of algorithms and data structures with Markovian inputs; it relies on the use of systems of stochastic recurrences combined with a product version of the Zolotarev metric.

preprint2015arXiv

A multiple filter test for the detection of rate changes in renewal processes with varying variance

Nonstationarity of the event rate is a persistent problem in modeling time series of events, such as neuronal spike trains. Motivated by a variety of patterns in neurophysiological spike train recordings, we define a general class of renewal processes. This class is used to test the null hypothesis of stationary rate versus a wide alternative of renewal processes with finitely many rate changes (change points). Our test extends ideas from the filtered derivative approach by using multiple moving windows simultaneously. To adjust the rejection threshold of the test, we use a Gaussian process, which emerges as the limit of the filtered derivative process. We also develop a multiple filter algorithm, which can be used when the null hypothesis is rejected in order to estimate the number and location of change points. We analyze the benefits of multiple filtering and its increased detection probability as compared to a single window approach. Application to spike trains recorded from dopamine midbrain neurons in anesthetized mice illustrates the relevance of the proposed techniques as preprocessing steps for methods that assume rate stationarity. In over 70% of all analyzed spike trains classified as rate nonstationary, different change points were detected by different window sizes.

preprint2015arXiv

Average Case and Distributional Analysis of Dual-Pivot Quicksort

In 2009, Oracle replaced the long-serving sorting algorithm in its Java 7 runtime library by a new dual-pivot Quicksort variant due to Vladimir Yaroslavskiy. The decision was based on the strikingly good performance of Yaroslavskiy's implementation in running time experiments. At that time, no precise investigations of the algorithm were available to explain its superior performance - on the contrary: Previous theoretical studies of other dual-pivot Quicksort variants even discouraged the use of two pivots. Only in 2012, two of the authors gave an average case analysis of a simplified version of Yaroslavskiy's algorithm, proving that savings in the number of comparisons are possible. However, Yaroslavskiy's algorithm needs more swaps, which renders the analysis inconclusive. To force the issue, we herein extend our analysis to the fully detailed style of Knuth: We determine the exact number of executed Java Bytecode instructions. Surprisingly, Yaroslavskiy's algorithm needs sightly more Bytecode instructions than a simple implementation of classic Quicksort - contradicting observed running times. Like in Oracle's library implementation we incorporate the use of Insertionsort on small subproblems and show that it indeed speeds up Yaroslavskiy's Quicksort in terms of Bytecodes; but even with optimal Insertionsort thresholds the new Quicksort variant needs slightly more Bytecode instructions on average. Finally, we show that the (suitably normalized) costs of Yaroslavskiy's algorithm converge to a random variable whose distribution is characterized by a fixed-point equation. From that, we compute variances of costs and show that for large n, costs are concentrated around their mean.

preprint2015arXiv

On a functional contraction method

Methods for proving functional limit laws are developed for sequences of stochastic processes which allow a recursive distributional decomposition either in time or space. Our approach is an extension of the so-called contraction method to the space $\mathcal{C}[0,1]$ of continuous functions endowed with uniform topology and the space $\mathcal {D}[0,1]$ of càdlàg functions with the Skorokhod topology. The contraction method originated from the probabilistic analysis of algorithms and random trees where characteristics satisfy natural distributional recurrences. It is based on stochastic fixed-point equations, where probability metrics can be used to obtain contraction properties and allow the application of Banach's fixed-point theorem. We develop the use of the Zolotarev metrics on the spaces $\mathcal{C}[0,1]$ and $\mathcal{D}[0,1]$ in this context. Applications are given, in particular, a short proof of Donsker's functional limit theorem is derived and recurrences arising in the probabilistic analysis of algorithms are discussed.

preprint2015arXiv

The CLT Analogue for Cyclic Urns

A cyclic urn is an urn model for balls of types $0,\ldots,m-1$ where in each draw the ball drawn, say of type $j$, is returned to the urn together with a new ball of type $j+1 \mod m$. The case $m=2$ is the well-known Friedman urn. The composition vector, i.e., the vector of the numbers of balls of each type after $n$ steps is, after normalization, known to be asymptotically normal for $2\le m\le 6$. For $m\ge 7$ the normalized composition vector does not converge. However, there is an almost sure approximation by a periodic random vector. In this paper the asymptotic fluctuations around this periodic random vector are identified. We show that these fluctuations are asymptotically normal for all $m\ge 7$. However, they are of maximal dimension $m-1$ only when $6$ does not divide $m$. For $m$ being a multiple of $6$ the fluctuations are supported by a two-dimensional subspace.

preprint2014arXiv

Analysis of radix selection on Markov sources

The complexity of the algorithm Radix Selection is considered for independent data generated from a Markov source. The complexity is measured by the number of bucket operations required and studied as a stochastic process indexed by the ranks; also the case of a uniformly chosen rank is considered. The orders of mean and variance of the complexity and limit theorems are derived. We find weak convergence of the appropriately normalized complexity towards a Gaussian process with explicit mean and covariance functions (in the space D[0,1] of cadlag functions on [0,1] with the Skorokhod metric) for uniform data and the asymmetric Bernoulli model. For uniformly chosen ranks and uniformly distributed data the normalized complexity was known to be asymptotically normal. For a general Markov source (excluding the uniform case) we find that this complexity is less concentrated and admits a limit law with non-normal limit distribution.

preprint2013arXiv

A Gaussian limit process for optimal FIND algorithms

We consider versions of the FIND algorithm where the pivot element used is the median of a subset chosen uniformly at random from the data. For the median selection we assume that subsamples of size asymptotic to $c \cdot n^α$ are chosen, where $0<α\le \frac{1}{2}$, $c>0$ and $n$ is the size of the data set to be split. We consider the complexity of FIND as a process in the rank to be selected and measured by the number of key comparisons required. After normalization we show weak convergence of the complexity to a centered Gaussian process as $n\to\infty$, which depends on $α$. The proof relies on a contraction argument for probability distributions on c{à}dl{à}g functions. We also identify the covariance function of the Gaussian limit process and discuss path and tail properties.

preprint2013arXiv

A limit process for partial match queries in random quadtrees and $2$-d trees

We consider the problem of recovering items matching a partially specified pattern in multidimensional trees (quadtrees and $k$-d trees). We assume the traditional model where the data consist of independent and uniform points in the unit square. For this model, in a structure on $n$ points, it is known that the number of nodes $C_n(ξ)$ to visit in order to report the items matching a random query $ξ$, independent and uniformly distributed on $[0,1]$, satisfies $\mathbf {E}[{C_n(ξ)}]\simκn^β$, where $κ$ and $β$ are explicit constants. We develop an approach based on the analysis of the cost $C_n(s)$ of any fixed query $s\in[0,1]$, and give precise estimates for the variance and limit distribution of the cost $C_n(x)$. Our results permit us to describe a limit process for the costs $C_n(x)$ as $x$ varies in $[0,1]$; one of the consequences is that $\mathbf {E}[{\max_{x\in[0,1]}C_n(x)}]\sim γn^β$; this settles a question of Devroye [Pers. Comm., 2000].

preprint2013arXiv

A statistical view on exchanges in Quickselect

In this paper we study the number of key exchanges required by Hoare's FIND algorithm (also called Quickselect) when operating on a uniformly distributed random permutation and selecting an independent uniformly distributed rank. After normalization we give a limit theorem where the limit law is a perpetuity characterized by a recursive distributional equation. To make the limit theorem usable for statistical methods and statistical experiments we provide an explicit rate of convergence in the Kolmogorov--Smirnov metric, a numerical table of the limit law's distribution function and an algorithm for exact simulation from the limit distribution. We also investigate the limit law's density. This case study provides a program applicable to other cost measures, alternative models for the rank selected and more balanced choices of the pivot element such as median-of-$2t+1$ versions of Quickselect as well as further variations of the algorithm.

preprint2013arXiv

Refined Quicksort asymptotics

The complexity of the Quicksort algorithm is usually measured by the number of key comparisons used during its execution. When operating on a list of $n$ data, permuted uniformly at random, the appropriately normalized complexity $Y_n$ is known to converge almost surely to a non-degenerate random limit $Y$. This assumes a natural embedding of all $Y_n$ on one probability space, e.g., via random binary search trees. In this note a central limit theorem for the error term in the latter almost sure convergence is shown: $$\sqrt{\frac{n}{2\log n}}(Y_n-Y) \stackrel{d}{\longrightarrow} {\cal N} \qquad (n\to\infty),$$ where ${\cal N}$ denotes a standard normal random variable.

preprint2012arXiv

Appendix to "Approximating perpetuities"

An algorithm for perfect simulation from the unique solution of the distributional fixed point equation $Y=_d UY + U(1-U)$ is constructed, where $Y$ and $U$ are independent and $U$ is uniformly distributed on $[0,1]$. This distribution comes up as a limit distribution in the probabilistic analysis of the Quickselect algorithm. Our simulation algorithm is based on coupling from the past with a multigamma coupler. It has four lines of code.

preprint2012arXiv

Asymptotic analysis of Hoppe trees

We introduce and analyze a random tree model associated to Hoppe's urn. The tree is built successively by adding nodes to the existing tree when starting with the single root node. In each step a node is added to the tree as a child of an existing node where these parent nodes are chosen randomly with probabilities proportional to their weights. The root node has weight $\vartheta>0$, a given fixed parameter, all other nodes have weight 1. This resembles the stochastic dynamic of Hoppe's urn. For $\vartheta=1$ the resulting tree is the well-studied random recursive tree. We analyze the height, internal path length and number of leaves of the Hoppe tree with $n$ nodes as well as the depth of the last inserted node asymptotically as $n\to \infty$. Mainly expectations, variances and asymptotic distributions of these parameters are derived.

preprint2012arXiv

Towards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model

Tries are among the most versatile and widely used data structures on words. They are pertinent to the (internal) structure of (stored) words and several splitting procedures used in diverse contexts ranging from document taxonomy to IP addresses lookup, from data compression (i.e., Lempel-Ziv'77 scheme) to dynamic hashing, from partial-match queries to speech recognition, from leader election algorithms to distributed hashing tables and graph compression. While the performance of tries under a realistic probabilistic model is of significant importance, its analysis, even for simplest memoryless sources, has proved difficult. Rigorous findings about inherently complex parameters were rarely analyzed (with a few notable exceptions) under more realistic models of string generations. In this paper we meet these challenges: By a novel use of the contraction method combined with analytic techniques we prove a central limit theorem for the external path length of a trie under a general Markov source. In particular, our results apply to the Lempel-Ziv'77 code. We envision that the methods described here will have further applications to other trie parameters and data structures.

preprint2011arXiv

Partial match queries in random quadtrees

We consider the problem of recovering items matching a partially specified pattern in multidimensional trees (quad trees and k-d trees). We assume the traditional model where the data consist of independent and uniform points in the unit square. For this model, in a structure on $n$ points, it is known that the number of nodes $C_n(ξ)$ to visit in order to report the items matching an independent and uniformly on $[0,1]$ random query $ξ$ satisfies $\Ec{C_n(ξ)}\sim κn^β$, where $κ$ and $β$ are explicit constants. We develop an approach based on the analysis of the cost $C_n(x)$ of any fixed query $x\in [0,1]$, and give precise estimates for the variance and limit distribution of the cost $C_n(x)$. Our results permit to describe a limit process for the costs $C_n(x)$ as $x$ varies in $[0,1]$; one of the consequences is that $E{\max_{x\in [0,1]} C_n(x)} \sim γn^β$.

preprint2004arXiv

On the contraction method with degenerate limit equation

A class of random recursive sequences (Y_n) with slowly varying variances as arising for parameters of random trees or recursive algorithms leads after normalizations to degenerate limit equations of the form X\stackrel{L}{=}X. For nondegenerate limit equations the contraction method is a main tool to establish convergence of the scaled sequence to the ``unique'' solution of the limit equation. In this paper we develop an extension of the contraction method which allows us to derive limit theorems for parameters of algorithms and data structures with degenerate limit equation. In particular, we establish some new tools and a general convergence scheme, which transfers information on mean and variance into a central limit law (with normal limit). We also obtain a convergence rate result. For the proof we use selfdecomposability properties of the limit normal distribution which allow us to mimic the recursive sequence by an accompanying sequence in normal variables.

Ralph Neininger

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

A note on the independence number, domination number and related parameters of random binary search trees and random recursive trees

Dependence and phase changes in random $m$-ary search trees

A Limit Theorem for Radix Sort and Tries with Markovian Input

A multiple filter test for the detection of rate changes in renewal processes with varying variance

Average Case and Distributional Analysis of Dual-Pivot Quicksort

On a functional contraction method

The CLT Analogue for Cyclic Urns

Analysis of radix selection on Markov sources

A Gaussian limit process for optimal FIND algorithms

A limit process for partial match queries in random quadtrees and $2$-d trees

A statistical view on exchanges in Quickselect

Refined Quicksort asymptotics

Appendix to "Approximating perpetuities"

Asymptotic analysis of Hoppe trees

Towards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model

Partial match queries in random quadtrees

On the contraction method with degenerate limit equation