Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
26works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

26 published item(s)

preprint2026arXiv

Detecting Planted Structure in Circular Data

Hypothesis testing problems for circular data are formulated, where observations take values on the unit circle and may contain a hidden, phase-coherent structure. Under the null, the data are independent uniform on the unit circle; under the alternative, either (i) a planted subset of size K concentrates around an unknown phase (the flat setting), or (ii) a planted community of size k induces coherence among the edges of a complete graph (the community setting). In each of the two settings, two circular signal distributions are considered: a hard-cluster distribution, where correlated planted observations lie in an arc of known length and unknown location, and a von Mises distribution, where correlated planted observations follow a von Mises distribution with a common unknown location parameter. For each of the four resulting models, nearly matching necessary and sufficient conditions are derived (up to constants and occasional logarithmic factors) for detectability, thereby establishing information-theoretic phase transitions.

preprint2022arXiv

Blockchain Security when Messages are Lost

Security analyses for consensus protocols in blockchain research have primarily focused on the synchronous model, where point-to-point communication delays are upper bounded by a known finite constant. These models are unrealistic in noisy settings, where messages may be lost (i.e. incur infinite delay). In this work, we study the impact of message losses on the security of the proof-of-work longest-chain protocol. We introduce a new communication model to capture the impact of message loss called the $0-\infty$ model, and derive a region of tolerable adversarial power under which the consensus protocol is secure. The guarantees are derived as a simple bound for the probability that a transaction violates desired security properties. Specifically, we show that this violation probability decays almost exponentially in the security parameter. Our approach involves constructing combinatorial objects from blocktrees, and identifying random variables associated with them that are amenable to analysis. This approach improves existing bounds and extends the known regime for tolerable adversarial threshold in settings where messages may be lost.

preprint2021arXiv

The Longest-Chain Protocol Under Random Delays

In the field of distributed consensus and blockchains, the synchronous communication model assumes that all messages between honest parties are delayed at most by a known constant $Δ$. Recent literature establishes that the longest-chain blockchain protocol is secure under the synchronous model. However, for a fixed mining rate, the security guarantees degrade with $Δ$. We analyze the performance of the longest-chain protocol under the assumption that the communication delays are random, independent, and identically distributed. This communication model allows for distributions with unbounded support and is a strict generalization of the synchronous model. We provide safety and liveness guarantees with simple, explicit bounds on the failure probabilities. These bounds hold for infinite-horizon executions and decay exponentially with the security parameter. In particular, we show that the longest-chain protocol has good security guarantees when delays are sporadically large and possibly unbounded, which is reflective of real-world network conditions.

preprint2016arXiv

Achieving Exact Cluster Recovery Threshold via Semidefinite Programming

The binary symmetric stochastic block model deals with a random graph of $n$ vertices partitioned into two equal-sized clusters, such that each pair of vertices is connected independently with probability $p$ within clusters and $q$ across clusters. In the asymptotic regime of $p=a \log n/n$ and $q=b \log n/n$ for fixed $a,b$ and $n \to \infty$, we show that the semidefinite programming relaxation of the maximum likelihood estimator achieves the optimal threshold for exactly recovering the partition from the graph with probability tending to one, resolving a conjecture of Abbe et al. \cite{Abbe14}. Furthermore, we show that the semidefinite programming relaxation also achieves the optimal recovery threshold in the planted dense subgraph model containing a single cluster of size proportional to $n$.

preprint2016arXiv

Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions

Resolving a conjecture of Abbe, Bandeira and Hall, the authors have recently shown that the semidefinite programming (SDP) relaxation of the maximum likelihood estimator achieves the sharp threshold for exactly recovering the community structure under the binary stochastic block model of two equal-sized clusters. The same was shown for the case of a single cluster and outliers. Extending the proof techniques, in this paper it is shown that SDP relaxations also achieve the sharp recovery threshold in the following cases: (1) Binary stochastic block model with two clusters of sizes proportional to network size but not necessarily equal; (2) Stochastic block model with a fixed number of equal-sized clusters; (3) Binary censored block model with the background graph being Erdős-Rényi. Furthermore, a sufficient condition is given for an SDP procedure to achieve exact recovery for the general case of a fixed number of clusters plus outliers. These results demonstrate the versatility of SDP relaxation as a simple, general purpose, computationally feasible methodology for community detection.

preprint2016arXiv

Information Limits for Recovering a Hidden Community

We study the problem of recovering a hidden community of cardinality $K$ from an $n \times n$ symmetric data matrix $A$, where for distinct indices $i,j$, $A_{ij} \sim P$ if $i, j$ both belong to the community and $A_{ij} \sim Q$ otherwise, for two known probability distributions $P$ and $Q$ depending on $n$. If $P={\rm Bern}(p)$ and $Q={\rm Bern}(q)$ with $p>q$, it reduces to the problem of finding a densely-connected $K$-subgraph planted in a large Erdös-Rényi graph; if $P=\mathcal{N}(μ,1)$ and $Q=\mathcal{N}(0,1)$ with $μ>0$, it corresponds to the problem of locating a $K \times K$ principal submatrix of elevated means in a large Gaussian random matrix. We focus on two types of asymptotic recovery guarantees as $n \to \infty$: (1) weak recovery: expected number of classification errors is $o(K)$; (2) exact recovery: probability of classifying all indices correctly converges to one. Under mild assumptions on $P$ and $Q$, and allowing the community size to scale sublinearly with $n$, we derive a set of sufficient conditions and a set of necessary conditions for recovery, which are asymptotically tight with sharp constants. The results hold in particular for the Gaussian case, and for the case of bounded log likelihood ratio, including the Bernoulli case whenever $\frac{p}{q}$ and $\frac{1-p}{1-q}$ are bounded away from zero and infinity. An important algorithmic implication is that, whenever exact recovery is information theoretically possible, any algorithm that provides weak recovery when the community size is concentrated near $K$ can be upgraded to achieve exact recovery in linear additional time by a simple voting procedure.

preprint2016arXiv

Semidefinite Programs for Exact Recovery of a Hidden Community

We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality $K$ from an $n \times n$ symmetric data matrix $A$, where for distinct indices $i,j$, $A_{ij} \sim P$ if $i, j$ are both in the community and $A_{ij} \sim Q$ otherwise, for two known probability distributions $P$ and $Q$. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case ($P={\rm Bern}(p)$ and $Q={\rm Bern}(q)$ with $p>q$) and the Gaussian case ($P=\mathcal{N}(μ,1)$ and $Q=\mathcal{N}(0,1)$ with $μ>0$), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If $K=ω( n /\log n)$, SDP attains the information-theoretic recovery limits with sharp constants; (2) If $K=Θ(n/\log n)$, SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If $K=o(n/\log n)$ and $K \to \infty$, SDP is order-wise suboptimal. The same critical scaling for $K$ is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of $n$ vertices partitioned into multiple communities of equal size $K$. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

preprint2016arXiv

Stability of a Random Multiple Access Channel with "Success-Failure" Feedback

We consider a model of a decentralized multiple access system with a non-standard binary feedback where the empty and collision situations cannot be distinguished. We show that, like in the case of a ternary feedback, for any input rate λ< 1/e, there exists a &#34;doubly randomized&#34; adaptive transmission protocol which stabilizes the behavior of the system. We discuss also a number of related problems and formulate some hypotheses.

preprint2015arXiv

Clustering and Inference From Pairwise Comparisons

Given a set of pairwise comparisons, the classical ranking problem computes a single ranking that best represents the preferences of all users. In this paper, we study the problem of inferring individual preferences, arising in the context of making personalized recommendations. In particular, we assume that there are $n$ users of $r$ types; users of the same type provide similar pairwise comparisons for $m$ items according to the Bradley-Terry model. We propose an efficient algorithm that accurately estimates the individual preferences for almost all users, if there are $r \max \{m, n\}\log m \log^2 n$ pairwise comparisons per type, which is near optimal in sample complexity when $r$ only grows logarithmically with $m$ or $n$. Our algorithm has three steps: first, for each user, compute the \emph{net-win} vector which is a projection of its $\binom{m}{2}$-dimensional vector of pairwise comparisons onto an $m$-dimensional linear subspace; second, cluster the users based on the net-win vectors; third, estimate a single preference for each cluster separately. The net-win vectors are much less noisy than the high dimensional vectors of pairwise comparisons and clustering is more accurate after the projection as confirmed by numerical experiments. Moreover, we show that, when a cluster is only approximately correct, the maximum likelihood estimation for the Bradley-Terry model is still close to the true preference.

preprint2015arXiv

Computational Lower Bounds for Community Detection on Random Graphs

This paper studies the problem of detecting the presence of a small dense community planted in a large Erdős-Rényi random graph $\mathcal{G}(N,q)$, where the edge probability within the community exceeds $q$ by a constant factor. Assuming the hardness of the planted clique detection problem, we show that the computational complexity of detecting the community exhibits the following phase transition phenomenon: As the graph size $N$ grows and the graph becomes sparser according to $q=N^{-α}$, there exists a critical value of $α= \frac{2}{3}$, below which there exists a computationally intensive procedure that can detect far smaller communities than any computationally efficient procedure, and above which a linear-time procedure is statistically optimal. The results also lead to the average-case hardness results for recovering the dense community and approximating the densest $K$-subgraph.

preprint2015arXiv

On Bidding with Securities: Risk Aversion and Positive Dependence

DeMarzo et al. (2005) consider auctions in which bids are selected from a completely ordered family of securities whose values are tied to the resource being auctioned. The paper defines a notion of relative steepness of families of securities and shows that a steeper family provides greater expected revenue to the seller. Two assumptions are: the buyers are risk-neutral; the random variables through which values and signals of the buyers are realized are affiliated. We show that this revenue ranking holds for the second price auction in the case of risk-aversion. However, it does not hold if affiliation is relaxed to a less restrictive form of positive dependence, namely first order stochastic dominance (FOSD). We define the relative strong steepness of families of securities and show that it provides a necessary and sufficient condition for comparing two families in the FOSD case. All results extend to the English auction.

preprint2015arXiv

Submatrix localization via message passing

The principal submatrix localization problem deals with recovering a $K\times K$ principal submatrix of elevated mean $μ$ in a large $n\times n$ symmetric matrix subject to additive standard Gaussian noise. This problem serves as a prototypical example for community detection, in which the community corresponds to the support of the submatrix. The main result of this paper is that in the regime $Ω(\sqrt{n}) \leq K \leq o(n)$, the support of the submatrix can be weakly recovered (with $o(K)$ misclassification errors on average) by an optimized message passing algorithm if $λ= μ^2K^2/n$, the signal-to-noise ratio, exceeds $1/e$. This extends a result by Deshpande and Montanari previously obtained for $K=Θ(\sqrt{n}).$ In addition, the algorithm can be extended to provide exact recovery whenever information-theoretically possible and achieve the information limit of exact recovery as long as $K \geq \frac{n}{\log n} (\frac{1}{8e} + o(1))$. The total running time of the algorithm is $O(n^2\log n)$. Another version of the submatrix localization problem, known as noisy biclustering, aims to recover a $K_1\times K_2$ submatrix of elevated mean $μ$ in a large $n_1\times n_2$ Gaussian matrix. The optimized message passing algorithm and its analysis are adapted to the bicluster problem assuming $Ω(\sqrt{n_i}) \leq K_i \leq o(n_i)$ and $K_1\asymp K_2.$ A sharp information-theoretic condition for the weak recovery of both clusters is also identified.

preprint2014arXiv

A Maximal Inequality for Supermartingales

A tight upper bound is given on the distribution of the maximum of a supermartingale. Specifically, it is shown that if $Y$ is a semimartingale with initial value zero and quadratic variation process $[Y,Y]$ such that $Y + [Y,Y]$ is a supermartingale, then the probability the maximum of $Y$ is greater than or equal to a positive constant $a$ is less than or equal to $1/(1+a).$ The proof makes use of the semimartingale calculus and is inspired by dynamic programming.

preprint2014arXiv

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we propose three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade-offs: one can gradually reduce the computational complexity when increasingly more observations are available.

preprint2014arXiv

Minimax-optimal Inference from Partial Rankings

This paper studies the problem of inferring a global preference based on the partial rankings provided by many users over different subsets of items according to the Plackett-Luce model. A question of particular interest is how to optimally assign items to users for ranking and how many item assignments are needed to achieve a target estimation error. For a given assignment of items to users, we first derive an oracle lower bound of the estimation error that holds even for the more general Thurstone models. Then we show that the Cramér-Rao lower bound and our upper bounds inversely depend on the spectral gap of the Laplacian of an appropriately defined comparison graph. When the system is allowed to choose the item assignment, we propose a random assignment scheme. Our oracle lower bound and upper bounds imply that it is minimax-optimal up to a logarithmic factor among all assignment schemes and the lower bound can be achieved by the maximum likelihood estimator as well as popular rank-breaking schemes that decompose partial rankings into pairwise comparisons. The numerical experiments corroborate our theoretical findings.

preprint2013arXiv

Single Video Performance Analysis for Video-on-Demand Systems

We study the content placement problem for cache delivery video-on-demand systems under static random network topologies with fixed heavy-tailed video demand. The performance measure is the amount of server load; we wish to minimize the total download rate for all users from the server and maximize the rate from caches. Our approach reduces the analysis for multiple videos to consideration of decoupled systems with a single video each. For each placement policy, insights gained from the single video analysis carry back to the original multiple video content placement problem. Finally, we propose a hybrid placement technique that achieves near optimal performance with low complexity.

preprint2013arXiv

Tree dynamics for peer-to-peer streaming

This paper presents an asynchronous distributed algorithm to manage multiple trees for peer-to-peer streaming in a flow level model. It is assumed that videos are cut into substreams, with or without source coding, to be distributed to all nodes. The algorithm guarantees that each node receives sufficiently many substreams within delay logarithmic in the number of peers. The algorithm works by constantly updating the topology so that each substream is distributed through trees to as many nodes as possible without interference. Competition among trees for limited upload capacity is managed so that both coverage and balance are achieved. The algorithm is robust in that it efficiently eliminates cycles and maintains tree structures in a distributed way. The algorithm favors nodes with higher degree, so it not only works for live streaming and video on demand, but also in the case a few nodes with large degree act as servers and other nodes act as clients. A proof of convergence of the algorithm is given assuming instantaneous update of depth information, and for the case of a single tree it is shown that the convergence time is stochastically tightly bounded by a small constant times the log of the number of nodes. These theoretical results are complemented by simulations showing that the algorithm works well even when most assumptions for the theoretical tractability do not hold.

preprint2012arXiv

Auctions with a Profit Sharing Contract

We study the problem of selling a resource through an auction mechanism. The winning buyer in turn develops this resource to generate profit. Two forms of payment are considered: charging the winning buyer a one-time payment, or an initial payment plus a profit sharing contract (PSC). We consider a symmetric interdependent values model with risk averse or risk neutral buyers and a risk neutral seller. For the second price auction and the English auction, we show that the seller&#39;s expected total revenue from the auction where he also takes a fraction of the positive profit is higher than the expected revenue from the auction with only a one-time payment. Moreover, the seller can generate an even higher expected total revenue if, in addition to taking a fraction of the positive profit, he also takes the same fraction of any loss incurred from developing the resource. Moving beyond simple PSCs, we show that the auction with a PSC from a very general class generates higher expected total revenue than the auction with only a one-time payment. Finally, we show that suitable PSCs provide higher expected total revenue than a one-time payment even when the incentives of the winning buyer to develop the resource must be addressed by the seller.

preprint2012arXiv

On the Incentive to Deviate in Core Selecting Combinatorial Auctions

Recent spectrum auctions in the United Kingdom, and some proposals for future auctions of spectrum in the United States, are based on preliminary price discovery rounds, followed by calculation of final prices for the winning buyers. For example, the prices could be the projection of Vikrey prices onto the core of reported prices. The use of Vikrey prices should lead to more straightforward bidding, but the projection reverses some of the incentive for bidders to report truthfully. Still, we conjecture that the price paid by a winning buyer increases no faster than the bid, as in a first price auction. It would be rather disturbing if the conjecture is false. The conjecture is established for a buyer interacting with disjoint groups of other buyers in a star network setting. It is also shown that for any core-selecting payment rule and any integer w greater than or equal to two, there is a market setting with w winning buyers such that the price paid by some winning buyer increases at least (1-1/w) times as fast as the price bid.

preprint2012arXiv

Stability of a Peer-to-Peer Communication System

This paper focuses on the stationary portion of file download in an unstructured peer-to-peer network, which typically follows for many hours after a flash crowd initiation. The model includes the case that peers can have some pieces at the time of arrival. The contribution of the paper is to identify how much help is needed from the seeds, either fixed seeds or peer seeds (which are peers remaining in the system after obtaining a complete collection) to stabilize the system. The dominant cause for instability is the missing piece syndrome, whereby one piece becomes very rare in the network. It is shown that stability can be achieved with only a small amount of help from peer seeds--even with very little help from a fixed seed, peers need dwell as peer seeds on average only long enough to upload one additional piece. The region of stability is insensitive to the piece selection policy. Network coding can substantially increase the region of stability in case a portion of the new peers arrive with randomly coded pieces.

preprint2012arXiv

The Supermarket Game

A supermarket game is considered with $N$ FCFS queues with unit exponential service rate and global Poisson arrival rate $N λ$. Upon arrival each customer chooses a number of queues to be sampled uniformly at random and joins the least loaded sampled queue. Customers are assumed to have cost for both waiting and sampling, and they want to minimize their own expected total cost. We study the supermarket game in a mean field model that corresponds to the limit as $N$ converges to infinity in the sense that (i) for a fixed symmetric customer strategy, the joint equilibrium distribution of any fixed number of queues converges as $N \to \infty$ to a product distribution determined by the mean field model and (ii) a Nash equilibrium for the mean field model is an $ε$-Nash equilibrium for the finite $N$ model with $N$ sufficiently large. It is shown that there always exists a Nash equilibrium for $λ<1$ and the Nash equilibrium is unique with homogeneous waiting cost for $λ^2 \le 1/2$. Furthermore, we find that the action of sampling more queues by some customers has a positive externality on the other customers in the mean field model, but can have a negative externality for finite $N$.

preprint2011arXiv

The Missing Piece Syndrome in Peer-to-Peer Communication

Typical protocols for peer-to-peer file sharing over the Internet divide files to be shared into pieces. New peers strive to obtain a complete collection of pieces from other peers and from a seed. In this paper we investigate a problem that can occur if the seeding rate is not large enough. The problem is that, even if the statistics of the system are symmetric in the pieces, there can be symmetry breaking, with one piece becoming very rare. If peers depart after obtaining a complete collection, they can tend to leave before helping other peers receive the rare piece. Assuming that peers arrive with no pieces, there is a single seed, random peer contacts are made, random useful pieces are downloaded, and peers depart upon receiving the complete file, the system is stable if the seeding rate (in pieces per time unit) is greater than the arrival rate, and is unstable if the seeding rate is less than the arrival rate. The result persists for any piece selection policy that selects from among useful pieces, such as rarest first, and it persists with the use of network coding.

preprint2010arXiv

Efficiency Loss in Revenue Optimal Auctions

We study efficiency loss in Bayesian revenue optimal auctions. We quantify this as the worst case ratio of loss in the realized social welfare to the social welfare that can be realized by an efficient auction. Our focus is on auctions with single-parameter buyers and where buyers&#39; valuation sets are finite. For binary valued single-parameter buyers with independent (not necessarily identically distributed) private valuations, we show that the worst case efficiency loss ratio (ELR) is no worse than it is with only one buyer; moreover, it is at most 1/2. Moving beyond the case of binary valuations but restricting to single item auctions, where buyers&#39; private valuations are independent and identically distributed, we obtain bounds on the worst case ELR as a function of number of buyers, cardinality of buyers&#39; valuation set, and ratio of maximum to minimum possible values that buyers can have for the item.

preprint2010arXiv

Revenue Optimal Auction for Single-Minded Buyers

We study the problem of characterizing revenue optimal auctions for single-minded buyers. Each buyer is interested only in a specific bundle of items and has a value for the same. Both his bundle and its value are his private information. The bundles that buyers are interested in and their corresponding values are assumed to be realized from known probability distributions independent across the buyers. We identify revenue optimal auctions with a simple structure, if the conditional distribution of any buyer&#39;s valuation is nondecreasing, in the hazard rates ordering of probability distributions, as a function of the bundle the buyer is interested in. The revenue optimal auction is given by the solution of a maximum weight independent set problem. We provide a novel graphical construction of the weights and highlight important properties of the resulting auction.

preprint2008arXiv

Low SNR Capacity of Noncoherent Fading Channels

Discrete-time Rayleigh fading single-input single-output (SISO) and multiple-input multiple-output (MIMO) channels are considered, with no channel state information at the transmitter or the receiver. The fading is assumed to be stationary and correlated in time, but independent from antenna to antenna. Peak-power and average-power constraints are imposed on the transmit antennas. For MIMO channels, these constraints are either imposed on the sum over antennas, or on each individual antenna. For SISO channels and MIMO channels with sum power constraints, the asymptotic capacity as the peak signal-to-noise ratio tends to zero is identified; for MIMO channels with individual power constraints, this asymptotic capacity is obtained for a class of channels called transmit separable channels. The results for MIMO channels with individual power constraints are carried over to SISO channels with delay spread (i.e. frequency selective fading).

preprint2008arXiv

Substitute Valuations: Generation and Structure

Substitute valuations (in some contexts called gross substitute valuations) are prominent in combinatorial auction theory. An algorithm is given in this paper for generating a substitute valuation through Monte Carlo simulation. In addition, the geometry of the set of all substitute valuations for a fixed number of goods K is investigated. The set consists of a union of polyhedrons, and the maximal polyhedrons are identified for K=4. It is shown that the maximum dimension of the maximal polyhedrons increases with K nearly as fast as two to the power K. Consequently, under broad conditions, if a combinatorial algorithm can present an arbitrary substitute valuation given a list of input numbers, the list must grow nearly as fast as two to the power K.