Source author record

Vahab Mirrokni

Vahab Mirrokni appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Computer Science and Game Theory Distributed, Parallel, and Cluster Computing math.OC Artificial Intelligence Cryptography and Security Social and Information Networks Computation and Language Computational Geometry math.PR math.ST Multiagent Systems Networking and Internet Architecture physics.soc-ph Statistics Theory Systems and Control

Catalog footprint

What is connected

35works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Lattice: Learning to Efficiently Compress the Memory

Attention mechanisms have revolutionized sequence learning but suffer from quadratic computational complexity. This paper introduces \model, a novel recurrent neural network (RNN) mechanism that leverages the inherent low-rank structure of K-V matrices to efficiently compress the cache into a fixed number of memory slots, achieving sub-quadratic complexity. We formulate this compression as an online optimization problem and derive a dynamic memory update rule based on a single gradient descent step. The resulting recurrence features a state- and input-dependent gating mechanism, offering an interpretable memory update process. The core innovation is the orthogonal update: each memory slot is updated exclusively with information orthogonal to its current state, hence incorporating only novel, non-redundant data to minimize interference with previously stored information. We derive an efficient computation for this orthogonal update rule and further approximate it with chunk-wise parallelization to ensure training scalability. Empirically, Lattice outperforms strong baselines on language modeling and associative recall tasks across diverse context lengths and model sizes, achieving superior memory efficiency with significantly reduced memory sizes.

preprint2025arXiv

MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling

State-space models (SSMs) have recently attention as an efficient alternative to computationally expensive attention-based models for sequence modeling. They rely on linear recurrences to integrate information over time, enabling fast inference, parallelizable training, and control over recurrence stability. However, traditional SSMs often suffer from limited effective memory, requiring larger state sizes for improved recall. Moreover, existing SSMs struggle to capture multi-scale dependencies, which are essential for modeling complex structures in time series, images, and natural language. This paper introduces a multi-scale SSM framework that addresses these limitations by representing sequence dynamics across multiple resolution and processing each resolution with specialized state-space dynamics. By capturing both fine-grained, high-frequency patterns and coarse, global trends, MS-SSM enhances memory efficiency and long-range modeling. We further introduce an input-dependent scale-mixer, enabling dynamic information fusion across resolutions. The proposed approach significantly improves sequence modeling, particularly in long-range and hierarchical tasks, while maintaining computational efficiency. Extensive experiments on benchmarks, including Long Range Arena, hierarchical reasoning, time series classification, and image recognition, demonstrate that MS-SSM consistently outperforms prior SSM-based models, highlighting the benefits of multi-resolution processing in state-space architectures.

preprint2025arXiv

Nested Learning: The Illusion of Deep Learning Architectures

Despite the recent progresses, particularly in developing Language Models, there are fundamental challenges and unanswered questions about how such models can continually learn/memorize, self-improve, and find effective solutions. In this paper, we present a new learning paradigm, called Nested Learning (NL), that coherently represents a machine learning model with a set of nested, multi-level, and/or parallel optimization problems, each of which with its own context flow. Through the lenses of NL, existing deep learning methods learns from data through compressing their own context flow, and in-context learning naturally emerges in large models. NL suggests a philosophy to design more expressive learning algorithms with more levels, resulting in higher-order in-context learning and potentially unlocking effective continual learning capabilities. We advocate for NL by presenting three core contributions: (1) Expressive Optimizers: We show that known gradient-based optimizers, such as Adam, SGD with Momentum, etc., are in fact associative memory modules that aim to compress the gradients' information (by gradient descent). Building on this insight, we present other more expressive optimizers with deep memory and/or more powerful learning rules; (2) Self-Modifying Learning Module: Taking advantage of NL's insights on learning algorithms, we present a sequence model that learns how to modify itself by learning its own update algorithm; and (3) Continuum Memory System: We present a new formulation for memory system that generalizes the traditional viewpoint of long/short-term memory. Combining our self-modifying sequence model with the continuum memory system, we present a continual learning module, called Hope, showing promising results in language modeling, knowledge incorporation, and few-shot generalization tasks, continual learning, and long-context reasoning tasks.

preprint2025arXiv

Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions

A soft-max function has two main efficiency measures: (1) approximation - which corresponds to how well it approximates the maximum function, (2) smoothness - which shows how sensitive it is to changes of its input. Our goal is to identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness. This leads to novel soft-max functions, each of which is optimal for a different application. The most commonly used soft-max function, called exponential mechanism, has optimal tradeoff between approximation measured in terms of expected additive approximation and smoothness measured with respect to Rényi Divergence. We introduce a soft-max function, called "piecewise linear soft-max", with optimal tradeoff between approximation, measured in terms of worst-case additive approximation and smoothness, measured with respect to $\ell_q$-norm. The worst-case approximation guarantee of the piecewise linear mechanism enforces sparsity in the output of our soft-max function, a property that is known to be important in Machine Learning applications [Martins et al. '16, Laha et al. '18] and is not satisfied by the exponential mechanism. Moreover, the $\ell_q$-smoothness is suitable for applications in Mechanism Design and Game Theory where the piecewise linear mechanism outperforms the exponential mechanism. Finally, we investigate another soft-max function, called power mechanism, with optimal tradeoff between expected \textit{multiplicative} approximation and smoothness with respect to the Rényi Divergence, which provides improved theoretical and practical results in differentially private submodular optimization.

preprint2025arXiv

Trellis: Learning to Compress Key-Value Memory in Attention Models

Transformers, while powerful, suffer from quadratic computational complexity and the ever-growing Key-Value (KV) cache of the attention mechanism. This paper introduces Trellis, a novel Transformer architecture with bounded memory that learns how to compress its key-value memory dynamically at test time. Trellis replaces the standard KV cache with a fixed-size memory and train a two-pass recurrent compression mechanism to store new keys and values into memory. To achieve this, it leverages an online gradient descent procedure with a forget gate, enabling the compressed memory to be updated recursively while learning to retain important contextual information from incoming tokens at test time. Extensive experiments on language modeling, common-sense reasoning, recall-intensive tasks, and time series show that the proposed architecture outperforms strong baselines. Notably, its performance gains increase as the sequence length grows, highlighting its potential for long-context applications.

preprint2023arXiv

Differentially Private Continual Releases of Streaming Frequency Moment Estimations

The streaming model of computation is a popular approach for working with large-scale data. In this setting, there is a stream of items and the goal is to compute the desired quantities (usually data statistics) while making a single pass through the stream and using as little space as possible. Motivated by the importance of data privacy, we develop differentially private streaming algorithms under the continual release setting, where the union of outputs of the algorithm at every timestamp must be differentially private. Specifically, we study the fundamental $\ell_p$ $(p\in [0,+\infty))$ frequency moment estimation problem under this setting, and give an $\varepsilon$-DP algorithm that achieves $(1+η)$-relative approximation $(\forall η\in(0,1))$ with $\mathrm{poly}\log(Tn)$ additive error and uses $\mathrm{poly}\log(Tn)\cdot \max(1, n^{1-2/p})$ space, where $T$ is the length of the stream and $n$ is the size of the universe of elements. Our space is near optimal up to poly-logarithmic factors even in the non-private setting. To obtain our results, we first reduce several primitives under the differentially private continual release model, such as counting distinct elements, heavy hitters and counting low frequency elements, to the simpler, counting/summing problems in the same setting. Based on these primitives, we develop a differentially private continual release level set estimation approach to address the $\ell_p$ frequency moment estimation problem. We also provide a simple extension of our results to the harder sliding window model, where the statistics must be maintained over the past $W$ data items.

preprint2023arXiv

Stars: Tera-Scale Graph Building for Clustering and Graph Learning

A fundamental procedure in the analysis of massive datasets is the construction of similarity graphs. Such graphs play a key role for many downstream tasks, including clustering, classification, graph learning, and nearest neighbor search. For these tasks, it is critical to build graphs which are sparse yet still representative of the underlying data. The benefits of sparsity are twofold: firstly, constructing dense graphs is infeasible in practice for large datasets, and secondly, the runtime of downstream tasks is directly influenced by the sparsity of the similarity graph. In this work, we present $\textit{Stars}$: a highly scalable method for building extremely sparse graphs via two-hop spanners, which are graphs where similar points are connected by a path of length at most two. Stars can construct two-hop spanners with significantly fewer similarity comparisons, which are a major bottleneck for learning based models where comparisons are expensive to evaluate. Theoretically, we demonstrate that Stars builds a graph in nearly-linear time, where approximate nearest neighbors are contained within two-hop neighborhoods. In practice, we have deployed Stars for multiple data sets allowing for graph building at the $\textit{Tera-Scale}$, i.e., for graphs with tens of trillions of edges. We evaluate the performance of Stars for clustering and graph learning, and demonstrate 10~1000-fold improvements in pairwise similarity comparisons compared to different baselines, and 2~10-fold improvement in running time without quality loss.

preprint2022arXiv

Constant Approximation for Normalized Modularity and Associations Clustering

We study the problem of graph clustering under a broad class of objectives in which the quality of a cluster is defined based on the ratio between the number of edges in the cluster, and the total weight of vertices in the cluster. We show that our definition is closely related to popular clustering measures, namely normalized associations, which is a dual of the normalized cut objective, and normalized modularity. We give a linear time constant-approximate algorithm for our objective, which implies the first constant-factor approximation algorithms for normalized modularity and normalized associations.

preprint2022arXiv

Efficiency of the First-Price Auction in the Autobidding World

We study the price of anarchy of the first-price auction in the autobidding world, where bidders can be either utility maximizers (i.e., traditional bidders) or value maximizers (i.e., autobidders). We show that with autobidders only, the price of anarchy of the first-price auction is $1/2$, and with both kinds of bidders, the price of anarchy degrades to about $0.457$ (the precise number is given by an optimization). These results complement the recent result by Jin and Lu [2022] showing that the price of anarchy of the first-price auction with traditional bidders only is $1 - 1/e^2$. We further investigate a setting where the seller can utilize machine-learned advice to improve the efficiency of the auctions. There, we show that as the accuracy of the advice increases, the price of anarchy improves smoothly from about $0.457$ to $1$.

preprint2022arXiv

Hierarchical Clustering in Graph Streams: Single-Pass Algorithms and Space Lower Bounds

The Hierarchical Clustering (HC) problem consists of building a hierarchy of clusters to represent a given dataset. Motivated by the modern large-scale applications, we study the problem in the \streaming model, in which the memory is heavily limited and only a single or very few passes over the input are allowed. Specifically, we investigate whether a good hierarchical clustering can be obtained, or at least whether we can approximately estimate the value of the optimal hierarchy. To measure the quality of a hierarchy, we use the HC minimization objective introduced by Dasgupta. Assuming that the input is an $n$-vertex weighted graph whose edges arrive in a stream, we derive the following results on space-vs-accuracy tradeoffs: * With $O(n\cdot \text{polylog}\,{n})$ space, we develop a single-pass algorithm, whose approximation ratio matches the currently best offline algorithm. * When the space is more limited, namely, $n^{1-o(1)}$, we prove that no algorithm can even estimate the value of optimum HC tree to within an $o(\frac{\log{n}}{\log\log{n}})$ factor, even when allowed $\text{polylog}{\,{n}}$ passes over the input. * In the most stringent setting of $\text{polylog}\,{n}$ space, we rule out algorithms that can even distinguish between "highly"-vs-"poorly" clusterable graphs, namely, graphs that have an $n^{1/2-o(1)}$ factor gap between their HC objective value. * Finally, we prove that any single-pass streaming algorithm that computes an optimal HC tree requires to store almost the entire input even if allowed exponential time. Our algorithmic results establish a general structural result that proves that cut sparsifiers of input graph can preserve cost of "balanced" HC trees to within a constant factor. Our lower bound results include a new streaming lower bound for a novel problem "One-vs-Many-Expanders", which can be of independent interest.

preprint2022arXiv

Improved Approximations for Euclidean $k$-means and $k$-median, via Nested Quasi-Independent Sets

Motivated by data analysis and machine learning applications, we consider the popular high-dimensional Euclidean $k$-median and $k$-means problems. We propose a new primal-dual algorithm, inspired by the classic algorithm of Jain and Vazirani and the recent algorithm of Ahmadian, Norouzi-Fard, Svensson, and Ward. Our algorithm achieves an approximation ratio of $2.406$ and $5.912$ for Euclidean $k$-median and $k$-means, respectively, improving upon the 2.633 approximation ratio of Ahmadian et al. and the 6.1291 approximation ratio of Grandoni, Ostrovsky, Rabani, Schulman, and Venkat. Our techniques involve a much stronger exploitation of the Euclidean metric than previous work on Euclidean clustering. In addition, we introduce a new method of removing excess centers using a variant of independent sets over graphs that we dub a "nested quasi-independent set". In turn, this technique may be of interest for other optimization problems in Euclidean and $\ell_p$ metric spaces.

preprint2022arXiv

Scalable Differentially Private Clustering via Hierarchically Separated Trees

We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space. By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art non private methods. We prove that our method computes a solution with cost at most $O(d^{3/2}\log n)\cdot OPT + O(k d^2 \log^2 n / ε^2)$, where $ε$ is the privacy guarantee. (The dimension term, $d$, can be replaced with $O(\log k)$ using standard dimension reduction techniques.) Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical, runs in near-linear, $\tilde{O}(nkd)$, time and scales to tens of millions of points. We also show that our method is amenable to parallelization in large-scale distributed computing environments. In particular we show that our private algorithms can be implemented in logarithmic number of MPC rounds in the sublinear memory regime. Finally, we complement our theoretical analysis with an empirical evaluation demonstrating the algorithm's efficiency and accuracy in comparison to other privacy clustering baselines.

preprint2022arXiv

Tight and Robust Private Mean Estimation with Few Users

In this work, we study high-dimensional mean estimation under user-level differential privacy, and design an $(\varepsilon,δ)$-differentially private mechanism using as few users as possible. In particular, we provide a nearly optimal trade-off between the number of users and the number of samples per user required for private mean estimation, even when the number of users is as low as $O(\frac{1}{\varepsilon}\log\frac{1}δ)$. Interestingly, this bound on the number of \emph{users} is independent of the dimension (though the number of \emph{samples per user} is allowed to depend polynomially on the dimension), unlike the previous work that requires the number of users to depend polynomially on the dimension. This resolves a problem first proposed by Amin et al. Moreover, our mechanism is robust against corruptions in up to $49\%$ of the users. Finally, our results also apply to optimal algorithms for privately learning discrete distributions with few users, answering a question of Liu et al., and a broader range of problems such as stochastic convex optimization and a variant of stochastic gradient descent via a reduction to differentially private mean estimation.

preprint2021arXiv

Batched Neural Bandits

In many sequential decision-making problems, the individuals are split into several batches and the decision-maker is only allowed to change her policy at the end of batches. These batch problems have a large number of applications, ranging from clinical trials to crowdsourcing. Motivated by this, we study the stochastic contextual bandit problem for general reward distributions under the batched setting. We propose the BatchNeuralUCB algorithm which combines neural networks with optimism to address the exploration-exploitation tradeoff while keeping the total number of batches limited. We study BatchNeuralUCB under both fixed and adaptive batch size settings and prove that it achieves the same regret as the fully sequential version while reducing the number of policy updates considerably. We confirm our theoretical results via simulations on both synthetic and real-world datasets.

preprint2021arXiv

Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via Continuous-Time Systems

Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a solution, sometimes never converging and instead following a limit cycle, and sometimes diverging. In this paper, we study the limiting behaviors of three classic minimax algorithms: gradient descent ascent (GDA), alternating gradient descent ascent (AGDA), and the extragradient method (EGM). Numerically, we observe that all of these limiting behaviors can arise in Generative Adversarial Networks (GAN) training and are easily demonstrated for a range of GAN problems. To explain these different behaviors, we study the high-order resolution continuous-time dynamics that correspond to each algorithm, which results in the sufficient (and almost necessary) conditions for the local convergence by each method. Moreover, this ODE perspective allows us to characterize the phase transition between these different limiting behaviors caused by introducing regularization as Hopf Bifurcations.

preprint2020arXiv

Accelerating Gradient Boosting Machine

Gradient Boosting Machine (GBM) is an extremely powerful supervised learning algorithm that is widely used in practice. GBM routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In this work, we propose Accelerated Gradient Boosting Machine (AGBM) by incorporating Nesterov's acceleration techniques into the design of GBM. The difficulty in accelerating GBM lies in the fact that weak (inexact) learners are commonly used, and therefore the errors can accumulate in the momentum term. To overcome it, we design a "corrected pseudo residual" and fit best weak learner to this corrected pseudo residual, in order to perform the z-update. Thus, we are able to derive novel computational guarantees for AGBM. This is the first GBM type of algorithm with theoretically-justified accelerated convergence rate. Finally we demonstrate with a number of numerical experiments the effectiveness of AGBM over conventional GBM in obtaining a model with good training and/or testing data fidelity.

preprint2020arXiv

Adaptivity in Adaptive Submodularity

Adaptive sequential decision making is one of the central challenges in machine learning and artificial intelligence. In such problems, the goal is to design an interactive policy that plans for an action to take, from a finite set of $n$ actions, given some partial observations. It has been shown that in many applications such as active learning, robotics, sequential experimental design, and active detection, the utility function satisfies adaptive submodularity, a notion that generalizes the notion of diminishing returns to policies. In this paper, we revisit the power of adaptivity in maximizing an adaptive monotone submodular function. We propose an efficient semi adaptive policy that with $O(\log n \times\log k)$ adaptive rounds of observations can achieve an almost tight $1-1/e-ε$ approximation guarantee with respect to an optimal policy that carries out $k$ actions in a fully sequential manner. To complement our results, we also show that it is impossible to achieve a constant factor approximation with $o(\log n)$ adaptive rounds. We also extend our result to the case of adaptive stochastic minimum cost coverage where the goal is to reach a desired utility $Q$ with the cheapest policy. We first prove the conjecture of the celebrated work of Golovin and Krause by showing that the greedy policy achieves the asymptotically tight logarithmic approximation guarantee without resorting to stronger notions of adaptivity. We then propose a semi adaptive policy that provides the same guarantee in polylogarithmic adaptive rounds through a similar information-parallelism scheme. Our results shrink the adaptivity gap in adaptive submodular maximization by an exponential factor.

preprint2020arXiv

Bandits with adversarial scaling

We study "adversarial scaling", a multi-armed bandit model where rewards have a stochastic and an adversarial component. Our model captures display advertising where the "click-through-rate" can be decomposed to a (fixed across time) arm-quality component and a non-stochastic user-relevance component (fixed across arms). Despite the relative stochasticity of our model, we demonstrate two settings where most bandit algorithms suffer. On the positive side, we show that two algorithms, one from the action elimination and one from the mirror descent family are adaptive enough to be robust to adversarial scaling. Our results shed light on the robustness of adaptive parameter selection in stochastic bandits, which may be of independent interest.

preprint2020arXiv

Dynamic Incentive-aware Learning: Robust Pricing in Contextual Auctions

Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers' valuations for an item depend on the context that describes the item. However, the seller is not aware of the relationship between the context and buyers' valuations, i.e., buyers' preferences. The seller's goal is to design a learning policy to set reserve prices via observing the past sales data, and her objective is to minimize her regret for revenue, where the regret is computed against a clairvoyant policy that knows buyers' heterogeneous preferences. Given the seller's goal, utility-maximizing buyers have the incentive to bid untruthfully in order to manipulate the seller's learning policy. We propose learning policies that are robust to such strategic behavior. These policies use the outcomes of the auctions, rather than the submitted bids, to estimate the preferences while controlling the long-term effect of the outcome of each auction on the future reserve prices. When the market noise distribution is known to the seller, we propose a policy called Contextual Robust Pricing (CORP) that achieves a T-period regret of $O(d\log(Td) \log (T))$, where $d$ is the dimension of {the} contextual information. When the market noise distribution is unknown to the seller, we propose two policies whose regrets are sublinear in $T$.

preprint2020arXiv

Near-Optimal Massively Parallel Graph Connectivity

Identifying the connected components of a graph, apart from being a fundamental problem with countless applications, is a key primitive for many other algorithms. In this paper, we consider this problem in parallel settings. Particularly, we focus on the Massively Parallel Computations (MPC) model, which is the standard theoretical model for modern parallel frameworks such as MapReduce, Hadoop, or Spark. We consider the truly sublinear regime of MPC for graph problems where the space per machine is $n^δ$ for some desirably small constant $δ\in (0, 1)$. We present an algorithm that for graphs with diameter $D$ in the wide range $[\log^ε n, n]$, takes $O(\log D)$ rounds to identify the connected components and takes $O(\log \log n)$ rounds for all other graphs. The algorithm is randomized, succeeds with high probability, does not require prior knowledge of $D$, and uses an optimal total space of $O(m)$. We complement this by showing a conditional lower-bound based on the widely believed TwoCycle conjecture that $Ω(\log D)$ rounds are indeed necessary in this setting. Studying parallel connectivity algorithms received a resurgence of interest after the pioneering work of Andoni et al. [FOCS 2018] who presented an algorithm with $O(\log D \cdot \log \log n)$ round-complexity. Our algorithm improves this result for the whole range of values of $D$ and almost settles the problem due to the conditional lower-bound. Additionally, we show that with minimal adjustments, our algorithm can also be implemented in a variant of the (CRCW) PRAM in asymptotically the same number of rounds.

preprint2020arXiv

Regret Bounds for Batched Bandits

We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and find the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.

preprint2016arXiv

Optimal dynamic mechanisms with ex-post IR via bank accounts

Lately, the problem of designing multi-stage dynamic mechanisms has been shown to be both theoretically challenging and practically important. In this paper, we consider the problem of designing revenue optimal dynamic mechanism for a setting where an auctioneer sells a set of items to a buyer in multiple stages. At each stage, there could be multiple items for sale but each item can only appear in one stage. The type of the buyer at each stage is thus a multi-dimensional vector characterizing the buyer's valuations of the items at that stage and is assumed to be stage-wise independent. In particular, we propose a novel class of mechanisms called bank account mechanisms. Roughly, a bank account mechanism is no different from any stage-wise individual mechanism except for an augmented structure called bank account, a real number for each node that summarizes the history so far. We first establish that the optimal revenue from any dynamic mechanism in this setting can be achieved by a bank account mechanism, and we provide a simple characterization of the set of incentive compatible and ex-post individually rational bank account mechanisms. Based on these characterizations, we then investigate the problem of finding the (approximately) optimal bank account mechanisms. We prove that there exists a simple, randomized bank account mechanism that approximates optimal revenue up to a constant factor. Our result is general and can accommodate previous approximation results in single-shot multi-dimensional mechanism design. Based on the previous mechanism, we further show that there exists a deterministic bank account mechanism that achieves constant-factor approximation as well. Finally, we consider the problem of computing optimal mechanisms when the type space is discrete and provide an FPTAS via linear and dynamic programming.

preprint2015arXiv

Distributed Balanced Partitioning via Linear Embedding

Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems, e.g., in some cases, a big graph can be chopped into pieces that fit on one machine to be processed independently before stitching the results together. In other cases, links between different parts may show up in the running time and/or network communications cost. We study a distributed balanced partitioning problem where the goal is to partition the vertices of a given graph into k pieces so as to minimize the total cut size. Our algorithm is composed of a few steps that are easily implementable in distributed computation frameworks. The algorithm first embeds nodes of the graph onto a line, and then processes nodes in a distributed manner guided by the linear embedding order. We examine various ways to find the first embedding, e.g., via a hierarchical clustering or Hilbert curves. Then we apply four different techniques including local swaps, minimum cuts on the boundaries of partitions, as well as contraction and dynamic programming. As our empirical study, we compare the above techniques with each other, and also to previous work in distributed graph algorithms, e.g., a label propagation method, FENNEL and Spinner. We report our results both on a private map graph and several public social networks, and show that our results beat previous distributed algorithms: e.g., compared to the label propagation algorithm, we report an improvement of 15-25% in the cut value. We also observe that our algorithms allow for scalable distributed implementation for any number of partitions. Finally, we apply our techniques for the Google Maps Driving Directions to minimize the number of multi-shard queries with the goal of saving in CPU usage. During live experiments, we observe an ~40% drop in the number of multi-shard queries when comparing our method with a standard geography-based method.

preprint2015arXiv

Expanders via Local Edge Flips

Designing distributed and scalable algorithms to improve network connectivity is a central topic in peer-to-peer networks. In this paper we focus on the following well-known problem: given an $n$-node $d$-regular network for $d=Ω(\log n)$, we want to design a decentralized, local algorithm that transforms the graph into one that has good connectivity properties (low diameter, expansion, etc.) without affecting the sparsity of the graph. To this end, Mahlmann and Schindelhauer introduced the random "flip" transformation, where in each time step, a random pair of vertices that have an edge decide to `swap a neighbor'. They conjectured that performing $O(n d)$ such flips at random would convert any connected $d$-regular graph into a $d$-regular expander graph, with high probability. However, the best known upper bound for the number of steps is roughly $O(n^{17} d^{23})$, obtained via a delicate Markov chain comparison argument. Our main result is to prove that a natural instantiation of the random flip produces an expander in at most $O(n^2 d^2 \sqrt{\log n})$ steps, with high probability. Our argument uses a potential-function analysis based on the matrix exponential, together with the recent beautiful results on the higher-order Cheeger inequality of graphs. We also show that our technique can be used to analyze another well-studied random process known as the `random switch', and show that it produces an expander in $O(n d)$ steps with high probability.

preprint2015arXiv

Randomized Composable Core-sets for Distributed Submodular Maximization

An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of {\em composable core-sets}, and has been recently applied to solve diversity maximization problems as well as several clustering problems. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique \cite{IMMM14}. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a {\em random clustering} of the data. We employ this technique for the coverage, monotone and non-monotone submodular maximization problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for submodular maximization in a distributed and streaming settings. In summary, we show that a simple greedy algorithm results in a $1/3$-approximate randomized composable core-set for submodular maximization under a cardinality constraint. This is in contrast to a known $O({\log k\over \sqrt{k}})$ impossibility result for (non-randomized) composable core-set. Our result also extends to non-monotone submodular functions, and leads to the first 2-round MapReduce-based constant-factor approximation algorithm with $O(n)$ total communication complexity for either monotone or non-monotone functions. Finally, using an improved analysis technique and a new algorithm $\mathsf{PseudoGreedy}$, we present an improved $0.545$-approximation algorithm for monotone submodular maximization, which is in turn the first MapReduce-based algorithm beating factor $1/2$ in a constant number of rounds.

preprint2015arXiv

Tight Bounds for Approximate Carathéodory and Beyond

We give a deterministic nearly-linear time algorithm for approximating any point inside a convex polytope with a sparse convex combination of the polytope's vertices. Our result provides a constructive proof for the Approximate Carathéodory Problem, which states that any point inside a polytope contained in the $\ell_p$ ball of radius $D$ can be approximated to within $ε$ in $\ell_p$ norm by a convex combination of only $O\left(D^2 p/ε^2\right)$ vertices of the polytope for $p \geq 2$. We also show that this bound is tight, using an argument based on anti-concentration for the binomial distribution. Along the way of establishing the upper bound, we develop a technique for minimizing norms over convex sets with complicated geometry; this is achieved by running Mirror Descent on a dual convex function obtained via Sion's Theorem. As simple extensions of our method, we then provide new algorithms for submodular function minimization and SVM training. For submodular function minimization we obtain a simplification and (provable) speed-up over Wolfe's algorithm, the method commonly found to be the fastest in practice. For SVM training, we obtain $O(1/ε^2)$ convergence for arbitrary kernels; each iteration only requires matrix-vector operations involving the kernel matrix, so we overcome the obstacle of having to explicitly store the kernel or compute its Cholesky factorization.

preprint2014arXiv

Clinching Auctions Beyond Hard Budget Constraints

Constraints on agent's ability to pay play a major role in auction design for any setting where the magnitude of financial transactions is sufficiently large. Those constraints have been traditionally modeled in mechanism design as \emph{hard budget}, i.e., mechanism is not allowed to charge agents more than a certain amount. Yet, real auction systems (such as Google AdWords) allow more sophisticated constraints on agents' ability to pay, such as \emph{average budgets}. In this work, we investigate the design of Pareto optimal and incentive compatible auctions for agents with \emph{constrained quasi-linear utilities}, which captures more realistic models of liquidity constraints that the agents may have. Our result applies to a very general class of allocation constraints known as polymatroidal environments, encompassing many settings of interest such as multi-unit auctions, matching markets, video-on-demand and advertisement systems. Our design is based Ausubel's \emph{clinching framework}. Incentive compatibility and feasibility with respect to ability-to-pay constraints are direct consequences of the clinching framework. Pareto-optimality, on the other hand, is considerably more challenging, since the no-trade condition that characterizes it depends not only on whether agents have their budgets exhausted or not, but also on prices {at} which the goods are allocated. In order to get a handle on those prices, we introduce novel concepts of dropping prices and saturation. These concepts lead to our main structural result which is a characterization of the tight sets in the clinching auction outcome and its relation to dropping prices.

preprint2014arXiv

Multiplicative Bidding in Online Advertising

In this paper, we initiate the study of the multiplicative bidding language adopted by major Internet search companies. In multiplicative bidding, the effective bid on a particular search auction is the product of a base bid and bid adjustments that are dependent on features of the search (for example, the geographic location of the user, or the platform on which the search is conducted). We consider the task faced by the advertiser when setting these bid adjustments, and establish a foundational optimization problem that captures the core difficulty of bidding under this language. We give matching algorithmic and approximation hardness results for this problem; these results are against an information-theoretic bound, and thus have implications on the power of the multiplicative bidding language itself. Inspired by empirical studies of search engine price data, we then codify the relevant restrictions of the problem, and give further algorithmic and hardness results. Our main technical contribution is an $O(\log n)$-approximation for the case of multiplicative prices and monotone values. We also provide empirical validations of our problem restrictions, and test our algorithms on real data against natural benchmarks. Our experiments show that they perform favorably compared with the baseline.

preprint2013arXiv

Local Graph Clustering Beyond Cheeger's Inequality

Motivated by applications of large-scale graph clustering, we study random-walk-based LOCAL algorithms whose running times depend only on the size of the output cluster, rather than the entire graph. All previously known such algorithms guarantee an output conductance of $\tilde{O}(\sqrt{ϕ(A)})$ when the target set $A$ has conductance $ϕ(A)\in[0,1]$. In this paper, we improve it to $$\tilde{O}\bigg( \min\Big\{\sqrt{ϕ(A)}, \frac{ϕ(A)}{\sqrt{\mathsf{Conn}(A)}} \Big\} \bigg)\enspace, $$ where the internal connectivity parameter $\mathsf{Conn}(A) \in [0,1]$ is defined as the reciprocal of the mixing time of the random walk over the induced subgraph on $A$. For instance, using $\mathsf{Conn}(A) = Ω(λ(A) / \log n)$ where $λ$ is the second eigenvalue of the Laplacian of the induced subgraph on $A$, our conductance guarantee can be as good as $\tilde{O}(ϕ(A)/\sqrt{λ(A)})$. This builds an interesting connection to the recent advance of the so-called improved Cheeger's Inequality [KKL+13], which says that global spectral algorithms can provide a conductance guarantee of $O(ϕ_{\mathsf{opt}}/\sqrt{λ_3})$ instead of $O(\sqrt{ϕ_{\mathsf{opt}}})$. In addition, we provide theoretical guarantee on the clustering accuracy (in terms of precision and recall) of the output set. We also prove that our analysis is tight, and perform empirical evaluation to support our theory on both synthetic and real data. It is worth noting that, our analysis outperforms prior work when the cluster is well-connected. In fact, the better it is well-connected inside, the more significant improvement (both in terms of conductance and accuracy) we can obtain. Our results shed light on why in practice some random-walk-based algorithms perform better than its previous theory, and help guide future research about local clustering.

preprint2012arXiv

Clinching Auctions with Online Supply

Auctions for perishable goods such as internet ad inventory need to make real-time allocation and pricing decisions as the supply of the good arrives in an online manner, without knowing the entire supply in advance. These allocation and pricing decisions get complicated when buyers have some global constraints. In this work, we consider a multi-unit model where buyers have global {\em budget} constraints, and the supply arrives in an online manner. Our main contribution is to show that for this setting there is an individually-rational, incentive-compatible and Pareto-optimal auction that allocates these units and calculates prices on the fly, without knowledge of the total supply. We do so by showing that the Adaptive Clinching Auction satisfies a {\em supply-monotonicity} property. We also analyze and discuss, using examples, how the insights gained by the allocation and payment rule can be applied to design better ad allocation heuristics in practice. Finally, while our main technical result concerns multi-unit supply, we propose a formal model of online supply that captures scenarios beyond multi-unit supply and has applications to sponsored search. We conjecture that our results for multi-unit auctions can be extended to these more general models.

preprint2012arXiv

On the Implications of Lookahead Search in Game Playing

Lookahead search is perhaps the most natural and widely used game playing strategy. Given the practical importance of the method, the aim of this paper is to provide a theoretical performance examination of lookahead search in a wide variety of applications. To determine a strategy play using lookahead search}, each agent predicts multiple levels of possible re-actions to her move (via the use of a search tree), and then chooses the play that optimizes her future payoff accounting for these re-actions. There are several choices of optimization function the agents can choose, where the most appropriate choice of function will depend on the specifics of the actual game - we illustrate this in our examples. Furthermore, the type of search tree chosen by computationally-constrained agent can vary. We focus on the case where agents can evaluate only a bounded number, $k$, of moves into the future. That is, we use depth $k$ search trees and call this approach {\em k-lookahead search}. We apply our method in five well-known settings: AdWord auctions; industrial organization (Cournot's model); congestion games; valid-utility games and basic-utility games; cost-sharing network design games. We consider two questions. First, what is the expected social quality of outcome when agents apply lookahead search? Second, what interactive behaviours can be exhibited when players use lookahead search?

preprint2012arXiv

Polyhedral Clinching Auctions and the Adwords Polytope

A central issue in applying auction theory in practice is the problem of dealing with budget-constrained agents. A desirable goal in practice is to design incentive compatible, individually rational, and Pareto optimal auctions while respecting the budget constraints. Achieving this goal is particularly challenging in the presence of nontrivial combinatorial constraints over the set of feasible allocations. Toward this goal and motivated by AdWords auctions, we present an auction for {\em polymatroidal} environments satisfying the above properties. Our auction employs a novel clinching technique with a clean geometric description and only needs an oracle access to the submodular function defining the polymatroid. As a result, this auction not only simplifies and generalizes all previous results, it applies to several new applications including AdWords Auctions, bandwidth markets, and video on demand. In particular, our characterization of the AdWords auction as polymatroidal constraints might be of independent interest. This allows us to design the first mechanism for Ad Auctions taking into account simultaneously budgets, multiple keywords and multiple slots. We show that it is impossible to extend this result to generic polyhedral constraints. This also implies an impossibility result for multi-unit auctions with decreasing marginal utilities in the presence of budget constraints.

preprint2012arXiv

Yield Optimization of Display Advertising with Ad Exchange

In light of the growing market of Ad Exchanges for the real-time sale of advertising slots, publishers face new challenges in choosing between the allocation of contract-based reservation ads and spot market ads. In this setting, the publisher should take into account the tradeoff between short-term revenue from an Ad Exchange and quality of allocating reservation ads. In this paper, we formalize this combined optimization problem as a stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange bids. We prove asymptotic optimality of this policy in terms of any trade-off between quality of delivered reservation ads and revenue from the exchange, and provide a rigorous bound for its convergence rate to the optimal policy. We also give experimental results on data derived from real publisher inventory, showing that our policy can achieve any pareto-optimal point on the quality vs. revenue curve. Finally, we study a parametric training-based algorithm in which instead of learning the dual variables from a sample data (as is done in non-parametric training-based algorithms), we learn the parameters of the distribution and construct those dual variables from the learned parameter values. We compare parametric and non-parametric ways to estimate from data both analytically and experimentally in the special case without the ad exchange, and show that though both methods converge to the optimal policy as the sample size grows, our parametric method converges faster, and thus performs better on smaller samples.

preprint2011arXiv

On the Non-Progressive Spread of Influence through Social Networks

The spread of influence in social networks is studied in two main categories: the progressive model and the non-progressive model (see e.g. the seminal work of Kempe, Kleinberg, and Tardos in KDD 2003). While the progressive models are suitable for modeling the spread of influence in monopolistic settings, non-progressive are more appropriate for modeling non-monopolistic settings, e.g., modeling diffusion of two competing technologies over a social network. Despite the extensive work on the progressive model, non-progressive models have not been studied well. In this paper, we study the spread of influence in the non-progressive model under the strict majority threshold: given a graph $G$ with a set of initially infected nodes, each node gets infected at time $τ$ iff a majority of its neighbors are infected at time $τ-1$. Our goal in the \textit{MinPTS} problem is to find a minimum-cardinality initial set of infected nodes that would eventually converge to a steady state where all nodes of $G$ are infected. We prove that while the MinPTS is NP-hard for a restricted family of graphs, it admits an improved constant-factor approximation algorithm for power-law graphs. We do so by proving lower and upper bounds in terms of the minimum and maximum degree of nodes in the graph. The upper bound is achieved in turn by applying a natural greedy algorithm. Our experimental evaluation of the greedy algorithm also shows its superior performance compared to other algorithms for a set of real-world graphs as well as the random power-law graphs. Finally, we study the convergence properties of these algorithms and show that the non-progressive model converges in at most $O(|E(G)|)$ steps.

preprint2010arXiv

Inner Product Spaces for MinSum Coordination Mechanisms

We study policies aiming to minimize the weighted sum of completion times of jobs in the context of coordination mechanisms for selfish scheduling problems. Our goal is to design local policies that achieve a good price of anarchy in the resulting equilibria for unrelated machine scheduling. To obtain the approximation bounds, we introduce a new technique that while conceptually simple, seems to be quite powerful. With this method we are able to prove the following results. First, we consider Smith's Rule, which orders the jobs on a machine in ascending processing time to weight ratio, and show that it achieves an approximation ratio of 4. We also demonstrate that this is the best possible for deterministic non-preemptive strongly local policies. Since Smith's Rule is always optimal for a given assignment, this may seem unsurprising, but we then show that better approximation ratios can be obtained if either preemption or randomization is allowed. We prove that ProportionalSharing, a preemptive strongly local policy, achieves an approximation ratio of 2.618 for the weighted sum of completion times, and an approximation ratio of 2.5 in the unweighted case. Again, we observe that these bounds are tight. Next, we consider Rand, a natural non-preemptive but randomized policy. We show that it achieves an approximation ratio of at most 2.13; moreover, if the sum of the weighted completion times is negligible compared to the cost of the optimal solution, this improves to π/2. Finally, we show that both ProportionalSharing and Rand induce potential games, and thus always have a pure Nash equilibrium (unlike Smith's Rule). This also allows us to design the first \emph{combinatorial} constant-factor approximation algorithm minimizing weighted completion time for unrelated machine scheduling that achieves a factor of 2+ εfor any ε> 0.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance