Researcher profile

Huacheng Yu

Huacheng Yu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Optimal bounds for approximate counting

Storing a counter incremented $N$ times would naively consume $O(\log N)$ bits of memory. In 1978 Morris described the very first streaming algorithm: the "Morris Counter". His algorithm's space bound is a random variable, and it has been shown to be $O(\log\log N + \log(1/\varepsilon) + \log(1/δ))$ bits in expectation to provide a $(1+\varepsilon)$-approximation with probability $1-δ$ to the counter's value. We provide a new simple algorithm with a simple analysis showing that randomized space $O(\log\log N + \log(1/\varepsilon) + \log\log(1/δ))$ bits suffice for the same task, i.e. an exponentially improved dependence on the inverse failure probability. We then provide a new analysis showing that the original Morris Counter itself, after a minor but necessary tweak, actually also enjoys this same improved upper bound. Lastly, we prove a new lower bound for this task showing optimality of our upper bound. We thus completely resolve the asymptotic space complexity of approximate counting. Furthermore all our constants are explicit, and our lower bound and tightest upper bound differ by a multiplicative factor of at most $3+o(1)$.

preprint2022arXiv

Strong XOR Lemma for Communication with Bounded Rounds

In this paper, we prove a strong XOR lemma for bounded-round two-player randomized communication. For a function $f:\mathcal{X}\times \mathcal{Y}\rightarrow\{0,1\}$, the $n$-fold XOR function $f^{\oplus n}:\mathcal{X}^n\times \mathcal{Y}^n\rightarrow\{0,1\}$ maps $n$ input pairs $(X_1,\ldots,X_n,Y_1,\ldots,Y_n)$ to the XOR of the $n$ output bits $f(X_1,Y_1)\oplus \cdots \oplus f(X_n, Y_n)$. We prove that if every $r$-round communication protocols that computes $f$ with probability $2/3$ uses at least $C$ bits of communication, then any $r$-round protocol that computes $f^{\oplus n}$ with probability $1/2+\exp(-O(n))$ must use $n\cdot \left(r^{-O(r)}\cdot C-1\right)$ bits. When $r$ is a constant and $C$ is sufficiently large, this is $Ω(n\cdot C)$ bits. It matches the communication cost and the success probability of the trivial protocol that computes the $n$ bits $f(X_i,Y_i)$ independently and outputs their XOR, up to a constant factor in $n$. A similar XOR lemma has been proved for $f$ whose communication lower bound can be obtained via bounding the discrepancy [Shaltiel'03]. By the equivalence between the discrepancy and the correlation with $2$-bit communication protocols [Viola-Wigderson'08], our new XOR lemma implies the previous result.

preprint2021arXiv

Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs

For a directed graph $G$ with $n$ vertices and a start vertex $u_{\sf start}$, we wish to (approximately) sample an $L$-step random walk over $G$ starting from $u_{\sf start}$ with minimum space using an algorithm that only makes few passes over the edges of the graph. This problem found many applications, for instance, in approximating the PageRank of a webpage. If only a single pass is allowed, the space complexity of this problem was shown to be $\tildeΘ(n \cdot L)$. Prior to our work, a better space complexity was only known with $\tilde{O}(\sqrt{L})$ passes. We settle the space complexity of this random walk simulation problem for two-pass streaming algorithms, showing that it is $\tildeΘ(n \cdot \sqrt{L})$, by giving almost matching upper and lower bounds. Our lower bound argument extends to every constant number of passes $p$, and shows that any $p$-pass algorithm for this problem uses $\tildeΩ(n \cdot L^{1/p})$ space. In addition, we show a similar $\tildeΘ(n \cdot \sqrt{L})$ bound on the space complexity of any algorithm (with any number of passes) for the related problem of sampling an $L$-step random walk from every vertex in the graph.

preprint2020arXiv

Nearly Optimal Static Las Vegas Succinct Dictionary

Given a set $S$ of $n$ (distinct) keys from key space $[U]$, each associated with a value from $Σ$, the \emph{static dictionary} problem asks to preprocess these (key, value) pairs into a data structure, supporting value-retrieval queries: for any given $x\in [U]$, $\mathtt{valRet}(x)$ must return the value associated with $x$ if $x\in S$, or return $\bot$ if $x\notin S$. The special case where $|Σ|=1$ is called the \emph{membership} problem. The "textbook" solution is to use a hash table, which occupies linear space and answers each query in constant time. On the other hand, the minimum possible space to encode all (key, value) pairs is only $\mathtt{OPT}:= \lceil\lg_2\binom{U}{n}+n\lg_2|Σ|\rceil$ bits, which could be much less. In this paper, we design a randomized dictionary data structure using $\mathtt{OPT}+\mathrm{poly}\lg n+O(\lg\lg\lg\lg\lg U)$ bits of space, and it has \emph{expected constant} query time, assuming the query algorithm can access an external lookup table of size $n^{0.001}$. The lookup table depends only on $U$, $n$ and $|Σ|$, and not the input. Previously, even for membership queries and $U\leq n^{O(1)}$, the best known data structure with constant query time requires $\mathtt{OPT}+n/\mathrm{poly}\lg n$ bits of space (Pagh [Pag01] and Pǎtraşcu [Pat08]); the best-known using $\mathtt{OPT}+n^{0.999}$ space has query time $O(\lg n)$; the only known non-trivial data structure with $\mathtt{OPT}+n^{0.001}$ space has $O(\lg n)$ query time and requires a lookup table of size $\geq n^{2.99}$ (!). Our new data structure answers open questions by Pǎtraşcu and Thorup [Pat08,Tho13]. We also present a scheme that compresses a sequence $X\inΣ^n$ to its zeroth order (empirical) entropy up to $|Σ|\cdot\mathrm{poly}\lg n$ extra bits, supporting decoding each $X_i$ in $O(\lg |Σ|)$ expected time.

preprint2020arXiv

Succinct Filters for Sets of Unknown Sizes

The membership problem asks to maintain a set $S\subseteq[u]$, supporting insertions and membership queries, i.e., testing if a given element is in the set. A data structure that computes exact answers is called a dictionary. When a (small) false positive rate $ε$ is allowed, the data structure is called a filter. The space usages of the standard dictionaries or filters usually depend on the upper bound on the size of $S$, while the actual set can be much smaller. Pagh, Segev and Wieder (FOCS'13) were the first to study filters with varying space usage based on the current $|S|$. They showed in order to match the space with the current set size $n=|S|$, any filter data structure must use $(1-o(1))n(\log(1/ε)+(1-O(ε))\log\log n)$ bits, in contrast to the well-known lower bound of $N\log(1/ε)$ bits, where $N$ is an upper bound on $|S|$. They also presented a data structure with almost optimal space of $(1+o(1))n(\log(1/ε)+O(\log\log n))$ bits provided that $n>u^{0.001}$, with expected amortized constant insertion time and worst-case constant lookup time. In this work, we present a filter data structure with improvements in two aspects: - it has constant worst-case time for all insertions and lookups with high probability; - it uses space $(1+o(1))n(\log (1/ε)+\log\log n)$ bits when $n>u^{0.001}$, achieving optimal leading constant for all $ε=o(1)$. We also present a dictionary that uses $(1+o(1))n\log(u/n)$ bits of space, matching the optimal space in terms of the current size, and performs all operations in constant time with high probability.

preprint2020arXiv

Tight Distributed Sketching Lower Bound for Connectivity

In this paper, we study the distributed sketching complexity of connectivity. In distributed graph sketching, an $n$-node graph $G$ is distributed to $n$ players such that each player sees the neighborhood of one vertex. The players then simultaneously send one message to the referee, who must compute some function of $G$ with high probability. For connectivity, the referee must output whether $G$ is connected. The goal is to minimize the message lengths. Such sketching schemes are equivalent to one-round protocols in the broadcast congested clique model. We prove that the expected average message length must be at least $Ω(\log^3 n)$ bits, if the error probability is at most $1/4$. It matches the upper bound obtained by the AGM sketch [AGM12], which even allows the referee to output a spanning forest of $G$ with probability $1-1/\mathrm{poly}\, n$. Our lower bound strengthens the previous $Ω(\log^3 n)$ lower bound for spanning forest computation [NY19]. Hence, it implies that connectivity, a decision problem, is as hard as its "search" version in this model.

preprint2011arXiv

A New Variation of Hat Guessing Games

Several variations of hat guessing games have been popularly discussed in recreational mathematics. In a typical hat guessing game, after initially coordinating a strategy, each of $n$ players is assigned a hat from a given color set. Simultaneously, each player tries to guess the color of his/her own hat by looking at colors of hats worn by other players. In this paper, we consider a new variation of this game, in which we require at least $k$ correct guesses and no wrong guess for the players to win the game, but they can choose to "pass". A strategy is called {\em perfect} if it can achieve the simple upper bound $\frac{n}{n+k}$ of the winning probability. We present sufficient and necessary condition on the parameters $n$ and $k$ for the existence of perfect strategy in the hat guessing games. In fact for any fixed parameter $k$, the existence of perfect strategy can be determined for every sufficiently large $n$. In our construction we introduce a new notion: $(d_1,d_2)$-regular partition of the boolean hypercube, which is worth to study in its own right. For example, it is related to the $k$-dominating set of the hypercube. It also might be interesting in coding theory. The existence of $(d_1,d_2)$-regular partition is explored in the paper and the existence of perfect $k$-dominating set follows as a corollary.