Researcher profile

Mark Huber

Mark Huber contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2021arXiv

Tail inequalities for restricted classes of discrete random variables

Let $X$ be an integrable discrete random variable over $\{0, 1, 2, \ldots\}$ with $\mathbb{P}(X = i + 1) \leq \mathbb{P}(X = i)$ for all $i$. Then for any integer $a \geq 1$, $\mathbb{P}(X \leq a) \leq \mathbb{E}[X] / (2a - 1)$. Let $W$ be an discrete random variable over $\{\ldots, -2, -1, 0, 1, 2, \ldots\}$ with finite second moment where the $\mathbb{P}(W = i)$ values are unimodal. Then $\mathbb{P}(|W - \mathbb{E}[W]| \geq a) \leq (\mathbb{V}(W) + 1 / 12) / (2(a - 1 / 2)^2)$.

preprint2020arXiv

Fewer colors for perfect simulation of proper colorings

Given a graph $G$ and color set $\{1, \ldots, k\}$, a $\textit{proper coloring}$ is an assignment of a color to each vertex of $G$ such that no two vertices connected by an edge are given the same color. The problem of drawing a proper coloring exactly uniformly from the set of proper colorings is well-studied. Most recently, Bhandari and Chakraborty developed a polynomial expected time randomized algorithm for obtaining such draws when $k > 3Δ$, where $Δ$ is the maximum degree of the graph. Their approach used a bounding chain together with the coupling from the past protocol. Here a new randomized algorithm is presented based upon the randomness recycler protocol introduced by the author and Fill at FOCS 2000. Given $n$ vertices, this method takes $O(n \ln (n))$ expected steps when $k > 2.27(Δ- 1)$ for all $Δ\geq 2$.

preprint2016arXiv

An estimator for Poisson means whose relative error distribution is known

Suppose that $X_1,X_2,\ldots$ are a stream of independent, identically distributed Poisson random variables with mean $μ$. This work presents a new estimate $μ_k$ for $μ$ with the property that the distribution of the relative error in the estimate ($(\hat μ_k/μ) - 1$) is known, and does not depend on $μ$ in any way. This enables the construction of simple exact confidence intervals for the estimate, as well as a means of obtaining fast approximation algorithms for high dimensional integration using TPA. The new estimate requires a random number of Poisson draws, and so is best suited to Monte Carlo applications. As an example of such an application, the method is applied to obtain an exact confidence interval for the normalizing constant of the Ising model.

preprint2016arXiv

Fast Perfect Simulation of Vervaat Perpetutities

This work presents a faster method of simulating exactly from a distribution known as a Vervaat perpetuity. A parameter of the Vervaat perpetuity is $β\in (0,\infty)$. An earlier method for simulating from this distributon ran in time $O((2.23β)^β).$ This earlier method utilized dominated coupling from the past that bounded a stochastic process for perpetuities from above. By extending to non-Markovian update functions, it is possible to create a new method that bounds the perpetuities from both above and below. This new approach is shown to run in $O(β\ln(β))$ time.

preprint2016arXiv

Multivariate distributions with fixed marginals and correlations

Consider the problem of drawing random variates $(X_1,\ldots,X_n)$ from a distribution where the marginal of each $X_i$ is specified, as well as the correlation between every pair $X_i$ and $X_j$. For given marginals, the Fréchet-Hoeffding bounds put a lower and upper bound on the correlation between $X_i$ and $X_j$. Any achievable correlation between $X_i$ and $X_j$ is a convex combinations of these bounds. The value $λ(X_i,X_j) \in [0,1]$ of this convex combination is called here the convexity parameter of $(X_i,X_j),$ with $λ(X_i,X_j) = 1$ corresponding to the upper bound and maximal correlation. For given marginal distributions functions $F_1,\ldots,F_n$ of $(X_1,\ldots,X_n)$ we show that $λ(X_i,X_j) = λ_{ij}$ if and only if there exist symmetric Bernoulli random variables $(B_1,\ldots,B_n)$ (that is $\{0,1\}$ random variables with mean 1/2) such that $λ(B_i,B_j) = λ_{ij}$. In addition, we characterize completely the set of convexity parameters for symmetric Bernoulli marginals in two, three and four dimensions.

preprint2016arXiv

Optimal linear Bernoulli factories for small mean problems

Suppose a coin with unknown probability $p$ of heads can be flipped as often as desired. A Bernoulli factory for a function $f$ is an algorithm that uses flips of the coin together with auxiliary randomness to flip a single coin with probability $f(p)$ of heads. Applications include near perfect sampling from the stationary distribution of regenerative processes. When $f$ is analytic, the problem can be reduced to a Bernoulli factory of the form $f(p) = Cp$ for constant $C$. Presented here is a new algorithm where for small values of $Cp$, requires roughly only $C$ coin flips to generate a $Cp$ coin. From information theory considerations, this is also conjectured to be (to first order) the minimum number of flips needed by any such algorithm. For $Cp$ large, the new algorithm can also be used to build a new Bernoulli factory that uses only 80\% of the expected coin flips of the older method, and applies to the more general problem of a multivariate Bernoulli factory, where there are $k$ coins, the $k$th coin has unknown probability $p_k$ of heads, and the goal is to simulate a coin flip with probability $C_1 p_1 + \cdots + C_k p_k$ of heads.

preprint2015arXiv

A Periodically Varying Luminous Quasar at z=2 from the Pan-STARRS1 Medium Deep Survey: A Candidate Supermassive Black Hole Binary in the Gravitational Wave-Driven Regime

Supermassive black hole binaries (SMBHBs) should be an inevitable consequence of the hierarchical growth of massive galaxies through mergers, and the strongest sirens of gravitational waves (GWs) in the cosmos. And yet, their direct detection has remained elusive due to the compact (sub-parsec) orbital separations of gravitationally bound SMBHBs. Here we exploit a theoretically predicted signature of a SMBHB in the time domain: periodic variability caused by a mass accretion rate that is modulated by the binary's orbital motion. We report our first significant periodically varying quasar detection from the systematic search in the Pan-STARRS1 (PS1) Medium Deep Survey. Our SMBHB candidate, PSO J334.2028+01.4075, is a luminous radio-loud quasar at $z=2.060$, with extended baseline photometry from the Catalina Real-Time Transient Survey, as well as archival spectroscopy from the FIRST Bright Quasar Survey. The observed period ($542 \pm 15$ days) and estimated black hole mass ($\log (M_{\rm BH}/M_\odot) = 9.97 \pm 0.50$), correspond to an orbital separation of $7^{+8}_{-4}$ Schwarzschild radii ($\sim 0.006^{+0.007}_{-0.003}$ pc), assuming the rest-frame period of the quasar variability traces the orbital period of the binary. This SMBHB candidate, discovered at the peak redshift for SMBH mergers, is in a physically stable configuration for a circumbinary accretion disk, and within the regime of GW-driven orbital decay. Our search with PS1 is a benchmark study for the exciting capabilities of LSST, which will have orders of magnitude larger survey power, and will potentially pinpoint the locations of thousands of SMBHBs in the variable night sky.

preprint2015arXiv

An unbiased estimate for the mean of a {0,1} random variable with relative error distribution independent of the mean

Say $X_1,X_2,\ldots$ are independent identically distributed Bernoulli random variables with mean $p$. This paper builds a new estimate $\hat p$ of $p$ that has the property that the relative error, $\hat p /p - 1$, of the estimate does not depend in any way on the value of $p$. This allows the construction of exact confidence intervals for $p$ of any desired level without needing any sort of limit or approximation. In addition, $\hat p$ is unbiased. For $ε$ and $δ$ in $(0,1)$, to obtain an estimate where $\mathbb{P}(|\hat p/p - 1| > ε) \leq δ$, the new algorithm takes on average at most $2ε^{-2} p^{-1}\ln(2δ^{-1})(1 - (14/3) ε)^{-1}$ samples. It is also shown that any such algorithm that applies whenever $p \leq 1/2$ requires at least $0.2ε^{-2} p^{-1}\ln((2-δ)δ^{-1})(1 + 2 ε)$ samples. The same algorithm can also be applied to estimate the mean of any random variable that falls in $[0,1]$.

preprint2015arXiv

Approximation algorithms for the normalizing constant of Gibbs distributions

Consider a family of distributions $\{π_β\}$ where $X\simπ_β$ means that $\mathbb{P}(X=x)=\exp(-βH(x))/Z(β)$. Here $Z(β)$ is the proper normalizing constant, equal to $\sum_x\exp(-βH(x))$. Then $\{π_β\}$ is known as a Gibbs distribution, and $Z(β)$ is the partition function. This work presents a new method for approximating the partition function to a specified level of relative accuracy using only a number of samples, that is, $O(\ln(Z(β))\ln(\ln(Z(β))))$ when $Z(0)\geq1$. This is a sharp improvement over previous, similar approaches that used a much more complicated algorithm, requiring $O(\ln(Z(β))\ln(\ln(Z(β)))^5)$ samples.

preprint2014arXiv

Differential expression analysis for multiple conditions

As high-throughput sequencing has become common practice, the cost of sequencing large amounts of genetic data has been drastically reduced, leading to much larger data sets for analysis. One important task is to identify biological conditions that lead to unusually high or low expression of a particular gene. Packages such as DESeq implement a simple method for testing differential signal when exactly two biological conditions are possible. For more than two conditions, pairwise testing is typically used. Here the DESeq method is extended so that three or more biological conditions can be assessed simultaneously. Because the computation time grows exponentially in the number of conditions, a Monte Carlo approach provides a fast way to approximate the $p$-values for the new test. The approach is studied on both simulated data and a data set of {\em C. jejuni}, the bacteria responsible for most food poisoning in the United States.

preprint2014arXiv

Improving Monte Carlo randomized approximation schemes

Consider a central problem in randomized approximation schemes that use a Monte Carlo approach. Given a sequence of independent, identically distributed random variables $X_1,X_2,\ldots$ with mean $μ$ and standard deviation at most $c μ$, where $c$ is a known constant, and $ε,δ> 0$, create an estimate $\hat μ$ for $μ$ such that $\text{P}(|\hat μ- μ| > εμ) \leq δ$. This technique has been used for building randomized approximation schemes for the volume of a convex body, the permanent of a nonnegative matrix, the number of linear extensions of a poset, the partition function of the Ising model and many other problems. Existing methods use (to the leading order) $19.35 (c/ε)^2 \ln(δ^{-1})$ samples. This is the best possible number up to the constant factor, and it is an open question as to what is the best constant possible. This work gives an easy to apply estimate that only uses $6.96 (c/ε)^2 \ln(δ^{-1})$ samples in the leading order.

preprint2014arXiv

Minimum correlation for any bivariate Geometric distribution

Consider a bivariate Geometric random variable where the first component has parameter $p_1$ and the second parameter $p_2$. It is not possible to make the correlation between the marginals equal to -1. Here the properties of this minimum correlation are studied both numerically and analytically. It is shown that the minimum correlation can be computed exactly in time $O(p_1^{-1} \ln(p_2^{-1}) + p_2^{-1} \ln(p_1^{-1}))$. The minimum correlation is shown to be nonmonotonic in $p_1$ and $p_2$, moreover, the partial derivatives are not continuous. For $p_1 = p_2$, these discontinuities are characterized completely and shown to lie near (1- roots of 1/2). In addition, we construct analytical bounds on the minimum correlation.

preprint2014arXiv

Nearly optimal Bernoulli factories for linear functions

Suppose that $X_1,X_2,\ldots$ are independent identically distributed Bernoulli random variables with mean $p$. A Bernoulli factory for a function $f$ takes as input $X_1,X_2,\ldots$ and outputs a random variable that is Bernoulli with mean $f(p).$ A fast algorithm is a function that only depends on the values of $X_1,\ldots,X_T$, where $T$ is a stopping time with small mean. When $f(p)$ is a real analytic function the problem reduces to being able to draw from linear functions $Cp$ for a constant $C > 1$. Also it is necessary that $Cp \leq 1 - ε$ for known $ε> 0$. Previous methods for this problem required extensive modification of the algorithm for every value of $C$ and $ε$. These methods did not have explicit bounds on $\text{E}[T]$ as a function of $C$ and $ε$. This paper presents the first Bernoulli factory for $f(p) = Cp$ with bounds on $\text{E}[T]$ as a function of the input parameters. In fact, $\sup_{p \in [0,(1-ε)/C]} \text{E}[T] \leq 9.5Cε^{-1}.$ In addition, this method is very simple to implement. Furthermore, a lower bound on the average running time of any $Cp$ Bernoulli factory is shown. For $ε\leq 1/2$, $\sup_{p \in [0,(1 - ε)/C]} \text{E}[T] \geq 0.004 C ε^{-1}$, so the new method is optimal up to a constant in the running time.

preprint2012arXiv

Spatial birth-death swap chains

Markov chains have long been used for generating random variates from spatial point processes. Broadly speaking, these chains fall into two categories: Metropolis-Hastings type chains running in discrete time and spatial birth-death chains running in continuous time. These birth-death chains only allow for removal of a point or addition of a point. In this paper it is shown that the addition of transitions where a point is moved from one location to the other can aid in shortening the mixing time of the chain. Here the mixing time of the chain is analyzed through coupling, and use of the swap moves allows for analysis of a broader class of chains. Furthermore, these swap moves can be employed in perfect sampling algorithms via the dominated coupling from the past procedure of Kendall and Møller. This method can be applied to any pairwise interaction model with repulsion. In particular, an application to the Strauss process is developed in detail, and the swap chains are shown to be much faster than standard birth-death chains.

preprint2011arXiv

Random construction of interpolating sets for high dimensional integration

Many high dimensional integrals can be reduced to the problem of finding the relative measures of two sets. Often one set will be exponentially larger than the other, making it difficult to compare the sizes. A standard method of dealing with this problem is to interpolate between the sets with a sequence of nested sets where neighboring sets have relative measures bounded above by a constant. Choosing such a well balanced sequence can be very difficult in practice. Here a new approach that automatically creates such sets is presented. These well balanced sets allow for faster approximation algorithms for integrals and sums, and better tempering and annealing Markov chains for generating random samples. Applications such as finding the partition function of the Ising model and normalizing constants for posterior distributions in Bayesian methods are discussed.