Source author record

Max Hahn-Klimroth

Max Hahn-Klimroth appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Discrete Mathematics Machine Learning Distributed, Parallel, and Cluster Computing math.CO math.ST Statistics Theory Data Structures and Algorithms math.PR Social and Information Networks

Catalog footprint

What is connected

8works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Distributed Reconstruction of Noisy Pooled Data

In the pooled data problem we are given a set of $n$ agents, each of which holds a hidden state bit, either $0$ or $1$. A querying procedure returns for a query set the sum of the states of the queried agents. The goal is to reconstruct the states using as few queries as possible. In this paper we consider two noise models for the pooled data problem. In the noisy channel model, the result for each agent flips with a certain probability. In the noisy query model, each query result is subject to random Gaussian noise. Our results are twofold. First, we present and analyze for both error models a simple and efficient distributed algorithm that reconstructs the initial states in a greedy fashion. Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability. Secondly, we present simulation results of our algorithm and compare its performance with approximate message passing (AMP) algorithms that are conjectured to be optimal in a number of related problems.

preprint2022arXiv

Inference of a Rumor's Source in the Independent Cascade Model

We consider the so-called Independent Cascade Model for rumor spreading or epidemic processes popularized by Kempe et al.\ [2003]. In this model, a small subset of nodes from a network are the source of a rumor. In discrete time steps, each informed node "infects" each of its uninformed neighbors with probability $p$. While many facets of this process are studied in the literature, less is known about the inference problem: given a number of infected nodes in a network, can we learn the source of the rumor? In the context of epidemiology this problem is often referred to as patient zero problem. It belongs to a broader class of problems where the goal is to infer parameters of the underlying spreading model, see, e.g., Lokhov [NeurIPS'16] or Mastakouri et al. [NeurIPS'20]. In this work we present a maximum likelihood estimator for the rumor's source, given a snapshot of the process in terms of a set of active nodes $X$ after $t$ steps. Our results show that, for cycle-free graphs, the likelihood estimator undergoes a non-trivial phase transition as a function $t$. We provide a rigorous analysis for two prominent classes of acyclic network, namely $d$-regular trees and Galton-Watson trees, and verify empirically that our heuristics work well in various general networks.

preprint2022arXiv

Near optimal efficient decoding from pooled data

Consider $n$ items, each of which is characterised by one of $d+1$ possible features in $\{0, \ldots, d\}$. We study the inference task of learning these types by queries on subsets, or pools, of the items that only reveal a form of coarsened information on the features - in our case, the sum of all the features in the pool. This is a realistic scenario in situations where one has memory or technical constraints in the data collection process, or where the data is subject to anonymisation. Related prominent problems are the quantitative group testing problem, of which it is a generalisation, as well as the compressed sensing problem, of which it is a special case. In the present article, we are interested in the minimum number of queries needed to efficiently infer the labels, if one of the features, say $0$, is dominant in the sense that the number $k$ of non-zero features among the items is much smaller than $n$. It is known that in this case, all features can be recovered in exponential time by using no more than $O(k)$ queries. However, so far, all \textit{efficient} inference algorithms required at least $Ω(k\ln n)$ queries, and it was unknown whether this gap is artificial or of a fundamental nature. Here we show that indeed, the previous gap between the information-theoretic and computational bounds is not inherent to the problem by providing an efficient algorithm that succeeds with high probability and employs no more than $O(k)$ measurements. This also solves a long standing open question for the quantitative group testing problem.

preprint2022arXiv

On the Hierarchy of Distributed Majority Protocols

We study the Consensus problem among $n$ agents, defined as follows. Initially, each agent holds one of two possible opinions. The goal is to reach a consensus configuration in which every agent shares the same opinion. To this end, agents randomly sample other agents and update their opinion according to a simple update function depending on the sampled opinions. We consider two communication models: the gossip model and a variant of the population model. In the gossip model, agents are activated in parallel, synchronous rounds. In the population model, one agent is activated after the other in a sequence of discrete time steps. For both models we analyze the following natural family of majority processes called $j$-Majority: when activated, every agent samples $j$ other agents uniformly at random (with replacement) and adopts the majority opinion among the sample (breaking ties uniformly at random). As our main result we show a hierarchy among majority protocols: $(j+1)$-Majority (for $j > 1$) converges stochastically faster than $j$-Majority for any initial opinion configuration. In our analysis we use Strassen's Theorem to prove the existence of a coupling. This gives an affirmative answer for the case of two opinions to an open question asked by Berenbrink et al. [2017].

preprint2022arXiv

On the Parallel Reconstruction from Pooled Data

In the pooled data problem the goal is to efficiently reconstruct a binary signal from additive measurements. Given a signal $σ\in \{ 0,1 \}^n$, we can query multiple entries at once and get the total number of non-zero entries in the query as a result. We assume that queries are time-consuming and therefore focus on the setting where all queries are executed in parallel. For the regime where the signal is sparse such that $ || σ||_1 = o(n)$ our results are twofold: First, we propose and analyze a simple and efficient greedy reconstruction algorithm. Secondly, we derive a sharp information-theoretic threshold for the minimum number of queries required to reconstruct $σ$ with high probability. Our first result matches the performance guarantees of much more involved constructions (Karimi et al. 2019). Our second result extends a result of Alaoui et al. (2014) and Scarlett & Cevher (2017) who studied the pooled data problem for dense signals. Finally, our theoretical findings are complemented with empirical simulations. Our data not only confirm the information-theoretic thresholds but also hint at the practical applicability of our pooling scheme and the simple greedy reconstruction algorithm.

preprint2022arXiv

Statistical and Computational Phase Transitions in Group Testing

We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease within a population of size n, based on the outcomes of pooled tests which return positive whenever there is at least one infected individual in the tested group. We consider two different simple random procedures for assigning individuals to tests: the constant-column design and Bernoulli design. Our first set of results concerns the fundamental statistical limits. For the constant-column design, we give a new information-theoretic lower bound which implies that the proportion of correctly identifiable infected individuals undergoes a sharp "all-or-nothing" phase transition when the number of tests crosses a particular threshold. For the Bernoulli design, we determine the precise number of tests required to solve the associated detection problem (where the goal is to distinguish between a group testing instance and pure noise), improving both the upper and lower bounds of Truong, Aldridge, and Scarlett (2020). For both group testing models, we also study the power of computationally efficient (polynomial-time) inference procedures. We determine the precise number of tests required for the class of low-degree polynomial algorithms to solve the detection problem. This provides evidence for an inherent computational-statistical gap in both the detection and recovery problems at small sparsity levels. Notably, our evidence is contrary to that of Iliopoulos and Zadik (2021), who predicted the absence of a computational-statistical gap in the Bernoulli design.

preprint2022arXiv

The full rank condition for sparse random matrices

We derive a sufficient condition for a sparse random matrix with given numbers of non-zero entries in the rows and columns having full row rank. The result covers both matrices over finite fields with independent non-zero entries and $\{0,1\}$-matrices over the rationals. The sufficient condition is generally necessary as well.

preprint2020arXiv

Random perturbation of sparse graphs

In the model of randomly perturbed graphs we consider the union of a deterministic graph $\mathcal{G}_α$ with minimum degree $αn$ and the binomial random graph $\mathbb{G}(n,p)$. This model was introduced by Bohman, Frieze, and Martin and for Hamilton cycles their result bridges the gap between Dirac's theorem and the results by Posá and Koršunov on the threshold in $\mathbb{G}(n,p)$. In this note we extend this result in $\mathcal{G}_α\cup \mathbb{G}(n,p)$ to sparser graphs with $α=o(1)$. More precisely, for any $\varepsilon>0$ and $α\colon \mathbb{N} \mapsto (0,1)$ we show that a.a.s. $\mathcal{G}_α\cup \mathbb{G}(n,β/n)$ is Hamiltonian, where $β= -(6 + \varepsilon) \log(α)$. If $α>0$ is a fixed constant this gives the aforementioned result by Bohman, Frieze, and Martin and if $α=O(1/n)$ the random part $\mathbb{G}(n,p)$ is sufficient for a Hamilton cycle. We also discuss embeddings of bounded degree trees and other spanning structures in this model, which lead to interesting questions on almost spanning embeddings into $\mathbb{G}(n,p)$.

Max Hahn-Klimroth

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Distributed Reconstruction of Noisy Pooled Data

Inference of a Rumor's Source in the Independent Cascade Model

Near optimal efficient decoding from pooled data

On the Hierarchy of Distributed Majority Protocols

On the Parallel Reconstruction from Pooled Data

Statistical and Computational Phase Transitions in Group Testing

The full rank condition for sparse random matrices

Random perturbation of sparse graphs