Source author record

Miklos Z. Racz

Miklos Z. Racz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Social and Information Networks Information Theory math.IT math.ST Statistics Theory Computer Science and Game Theory Data Structures and Algorithms math.CO physics.soc-ph Computational Complexity Cryptography and Security Discrete Mathematics Genomics Machine Learning Quantitative Methods

Catalog footprint

What is connected

14works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Exact Community Recovery in Correlated Stochastic Block Models

We consider the problem of learning latent community structure from multiple correlated networks. We study edge-correlated stochastic block models with two balanced communities, focusing on the regime where the average degree is logarithmic in the number of vertices. Our main result derives the precise information-theoretic threshold for exact community recovery using multiple correlated graphs. This threshold captures the interplay between the community recovery and graph matching tasks. In particular, we uncover and characterize a region of the parameter space where exact community recovery is possible using multiple correlated graphs, even though (1) this is information-theoretically impossible using a single graph and (2) exact graph matching is also information-theoretically impossible. In this regime, we develop a novel algorithm that carefully synthesizes algorithms from the community recovery and graph matching literatures.

preprint2021arXiv

Batch Optimization for DNA Synthesis

Large pools of synthetic DNA molecules have been recently used to reliably store significant volumes of digital data. While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of the high cost and low throughput of available DNA synthesis technologies. We study the role of batch optimization in reducing the cost of large scale DNA synthesis, which translates to the following algorithmic task. Given a large pool $\mathcal{S}$ of random quaternary strings of fixed length, partition $\mathcal{S}$ into batches in a way that minimizes the sum of the lengths of the shortest common supersequences across batches. We introduce two ideas for batch optimization that both improve (in different ways) upon a naive baseline: (1) using both $(ACGT)^{*}$ and its reverse $(TGCA)^{*}$ as reference strands, and batching appropriately, and (2) batching via the quantiles of an appropriate ordering of the strands. We also prove asymptotically matching lower bounds on the cost of DNA synthesis, showing that one cannot improve upon these two ideas. Our results uncover a surprising separation between two cases that naturally arise in the context of DNA data storage: the asymptotic cost savings of batch optimization are significantly greater in the case where strings in $\mathcal{S}$ do not contain repeats of the same character (homopolymers), as compared to the case where strings in $\mathcal{S}$ are unconstrained.

preprint2020arXiv

Correlated randomly growing graphs

We introduce a new model of correlated randomly growing graphs and study the fundamental questions of detecting correlation and estimating aspects of the correlated structure. The model is simple and starts with any model of randomly growing graphs, such as uniform attachment (UA) or preferential attachment (PA). Given such a model, a pair of graphs $(G_1, G_2)$ is grown in two stages: until time $t_{\star}$ they are grown together (i.e., $G_1 = G_2$), after which they grow independently according to the underlying growth model. We show that whenever the seed graph has an influence in the underlying graph growth model---this has been shown for PA and UA trees and is conjectured to hold broadly---then correlation can be detected in this model, even if the graphs are grown together for just a single time step. We also give a general sufficient condition (which holds for PA and UA trees) under which detection is possible with probability going to $1$ as $t_{\star} \to \infty$. Finally, we show for PA and UA trees that the amount of correlation, measured by $t_{\star}$, can be estimated with vanishing relative error as $t_{\star} \to \infty$.

preprint2020arXiv

Network disruption: maximizing disagreement and polarization in social networks

Recent years have seen a marked increase in the spread of misinformation, a phenomenon which has been accelerated and amplified by social media such as Facebook and Twitter. While some actors spread misinformation to push a specific agenda, it has also been widely documented that others aim to simply disrupt the network by increasing disagreement and polarization across the network and thereby destabilizing society. Popular social networks are also vulnerable to large-scale attacks. Motivated by this reality, we introduce a simple model of network disruption where an adversary can take over a limited number of user profiles in a social network with the aim of maximizing disagreement and/or polarization in the network. We investigate this model both theoretically and empirically. We show that the adversary will always change the opinion of a taken-over profile to an extreme in order to maximize disruption. We also prove that an adversary can increase disagreement / polarization at most linearly in the number of user profiles it takes over. Furthermore, we present a detailed empirical study of several natural algorithms for the adversary on both synthetic networks and real world (Reddit and Twitter) data sets. These show that even simple, unsophisticated heuristics, such as targeting centrists, can disrupt a network effectively, causing a large increase in disagreement / polarization. Studying the problem of network disruption through the lens of an adversary thus highlights the seriousness of the problem.

preprint2020arXiv

Reconstructing Trees from Traces

We study the problem of learning a node-labeled tree given independent traces from an appropriately defined deletion channel. This problem, tree trace reconstruction, generalizes string trace reconstruction, which corresponds to the tree being a path. For many classes of trees, including complete trees and spiders, we provide algorithms that reconstruct the labels using only a polynomial number of traces. This exhibits a stark contrast to known results on string trace reconstruction, which require exponentially many traces, and where a central open problem is to determine whether a polynomial number of traces suffice. Our techniques combine novel combinatorial and complex analytic methods.

preprint2020arXiv

Rumor source detection with multiple observations under adaptive diffusions

Recent work, motivated by anonymous messaging platforms, has introduced adaptive diffusion protocols which can obfuscate the source of a rumor: a "snapshot adversary" with access to the subgraph of "infected" nodes can do no better than randomly guessing the entity of the source node. What happens if the adversary has access to multiple independent snapshots? We study this question when the underlying graph is the infinite $d$-regular tree. We show that (1) a weak form of source obfuscation is still possible in the case of two independent snapshots, but (2) already with three observations there is a simple algorithm that finds the rumor source with constant probability, regardless of the adaptive diffusion protocol. We also characterize the tradeoff between local spreading and source obfuscation for adaptive diffusion protocols (under a single snapshot). These results raise questions about the robustness of anonymity guarantees when spreading information in social networks.

preprint2016arXiv

Basic models and questions in statistical network analysis

Extracting information from large graphs has become an important statistical problem since network data is now common in various fields. In this minicourse we will investigate the most natural statistical questions for three canonical probabilistic models of networks: (i) community detection in the stochastic block model, (ii) finding the embedding of a random geometric graph, and (iii) finding the original vertex in a preferential attachment tree. Along the way we will cover many interesting topics in probability theory such as Pólya urns, large deviation theory, concentration of measure in high dimension, entropic central limit theorems, and more.

preprint2016arXiv

Sequence assembly from corrupted shotgun reads

The prevalent technique for DNA sequencing consists of two main steps: shotgun sequencing, where many randomly located fragments, called reads, are extracted from the overall sequence, followed by an assembly algorithm that aims to reconstruct the original sequence. There are many different technologies that generate the reads: widely-used second-generation methods create short reads with low error rates, while emerging third-generation methods create long reads with high error rates. Both error rates and error profiles differ among methods, so reconstruction algorithms are often tailored to specific shotgun sequencing technologies. As these methods change over time, a fundamental question is whether there exist reconstruction algorithms which are robust, i.e., which perform well under a wide range of error distributions. Here we study this question of sequence assembly from corrupted reads. We make no assumption on the types of errors in the reads, but only assume a bound on their magnitude. More precisely, for each read we assume that instead of receiving the true read with no errors, we receive a corrupted read which has edit distance at most $ε$ times the length of the read from the true read. We show that if the reads are long enough and there are sufficiently many of them, then approximate reconstruction is possible: we construct a simple algorithm such that for almost all original sequences the output of the algorithm is a sequence whose edit distance from the original one is at most $O(ε)$ times the length of the original sequence.

preprint2015arXiv

Beta-gamma tail asymptotics

We compute the tail asymptotics of the product of a beta random variable and a generalized gamma random variable which are independent and have general parameters. A special case of these asymptotics were proved and used in a recent work of Bubeck, Mossel, and Rácz in order to determine the tail asymptotics of the maximum degree of the preferential attachment tree. The proof presented here is simpler and highlights why these asymptotics hold.

preprint2015arXiv

Coexistence in preferential attachment networks

We introduce a new model of competition on growing networks. This extends the preferential attachment model, with the key property that node choices evolve simultaneously with the network. When a new node joins the network, it chooses neighbours by preferential attachment, and selects its type based on the number of initial neighbours of each type. The model is analysed in detail, and in particular, we determine the possible proportions of the various types in the limit of large networks. An important qualitative feature we find is that, in contrast to many current theoretical models, often several competitors will coexist. This matches empirical observations in many real-world networks.

preprint2013arXiv

A Smooth Transition from Powerlessness to Absolute Power

We study the phase transition of the coalitional manipulation problem for generalized scoring rules. Previously it has been shown that, under some conditions on the distribution of votes, if the number of manipulators is $o(\sqrt{n})$, where $n$ is the number of voters, then the probability that a random profile is manipulable by the coalition goes to zero as the number of voters goes to infinity, whereas if the number of manipulators is $ω(\sqrt{n})$, then the probability that a random profile is manipulable goes to one. Here we consider the critical window, where a coalition has size $c\sqrt{n}$, and we show that as $c$ goes from zero to infinity, the limiting probability that a random profile is manipulable goes from zero to one in a smooth fashion, i.e., there is a smooth phase transition between the two regimes. This result analytically validates recent empirical results, and suggests that deciding the coalitional manipulation problem may be of limited computational hardness in practice.

preprint2012arXiv

A quantitative Gibbard-Satterthwaite theorem without neutrality

Recently, quantitative versions of the Gibbard-Satterthwaite theorem were proven for $k=3$ alternatives by Friedgut, Kalai, Keller and Nisan and for neutral functions on $k \geq 4$ alternatives by Isaksson, Kindler and Mossel. We prove a quantitative version of the Gibbard-Satterthwaite theorem for general social choice functions for any number $k \geq 3$ of alternatives. In particular we show that for a social choice function $f$ on $k \geq 3$ alternatives and $n$ voters, which is $ε$-far from the family of nonmanipulable functions, a uniformly chosen voter profile is manipulable with probability at least inverse polynomial in $n$, $k$, and $ε^{-1}$. Removing the neutrality assumption of previous theorems is important for multiple reasons. For one, it is known that there is a conflict between anonymity and neutrality, and since most common voting rules are anonymous, they cannot always be neutral. Second, virtual elections are used in many applications in artificial intelligence, where there are often restrictions on the outcome of the election, and so neutrality is not a natural assumption in these situations. Ours is a unified proof which in particular covers all previous cases established before. The proof crucially uses reverse hypercontractivity in addition to several ideas from the two previous proofs. Much of the work is devoted to understanding functions of a single voter, and in particular we also prove a quantitative Gibbard-Satterthwaite theorem for one voter.

preprint2012arXiv

Modeling Flocks and Prices: Jumping Particles with an Attractive Interaction

We introduce and investigate a new model of a finite number of particles jumping forward on the real line. The jump lengths are independent of everything, but the jump rate of each particle depends on the relative position of the particle compared to the center of mass of the system. The rates are higher for those left behind, and lower for those ahead of the center of mass, providing an attractive interaction keeping the particles together. We prove that in the fluid limit, as the number of particles goes to infinity, the evolution of the system is described by a mean field equation that exhibits traveling wave solutions. A connection to extreme value statistics is also provided.

preprint2012arXiv

Miklos Z. Racz

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Exact Community Recovery in Correlated Stochastic Block Models

Batch Optimization for DNA Synthesis

Correlated randomly growing graphs

Network disruption: maximizing disagreement and polarization in social networks

Reconstructing Trees from Traces

Rumor source detection with multiple observations under adaptive diffusions

Basic models and questions in statistical network analysis

Sequence assembly from corrupted shotgun reads

Beta-gamma tail asymptotics

Coexistence in preferential attachment networks

A Smooth Transition from Powerlessness to Absolute Power

A quantitative Gibbard-Satterthwaite theorem without neutrality

Modeling Flocks and Prices: Jumping Particles with an Attractive Interaction

Modeling Flocks and Prices: Jumping Particles with an Attractive Interaction (shortened version)