Researcher profile

Konstantin Makarychev

Konstantin Makarychev contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
24works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

24 published item(s)

preprint2022arXiv

Explainable k-means. Don't be greedy, plant bigger trees!

We provide a new bi-criteria $\tilde{O}(\log^2 k)$ competitive algorithm for explainable $k$-means clustering. Explainable $k$-means was recently introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). It is described by an easy to interpret and understand (threshold) decision tree or diagram. The cost of the explainable $k$-means clustering equals to the sum of costs of its clusters; and the cost of each cluster equals the sum of squared distances from the points in the cluster to the center of that cluster. The best non bi-criteria algorithm for explainable clustering $\tilde{O}(k)$ competitive, and this bound is tight. Our randomized bi-criteria algorithm constructs a threshold decision tree that partitions the data set into $(1+δ)k$ clusters (where $δ\in (0,1)$ is a parameter of the algorithm). The cost of this clustering is at most $\tilde{O}(1/ δ\cdot \log^2 k)$ times the cost of the optimal unconstrained $k$-means clustering. We show that this bound is almost optimal.

preprint2021arXiv

Batch Optimization for DNA Synthesis

Large pools of synthetic DNA molecules have been recently used to reliably store significant volumes of digital data. While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of the high cost and low throughput of available DNA synthesis technologies. We study the role of batch optimization in reducing the cost of large scale DNA synthesis, which translates to the following algorithmic task. Given a large pool $\mathcal{S}$ of random quaternary strings of fixed length, partition $\mathcal{S}$ into batches in a way that minimizes the sum of the lengths of the shortest common supersequences across batches. We introduce two ideas for batch optimization that both improve (in different ways) upon a naive baseline: (1) using both $(ACGT)^{*}$ and its reverse $(TGCA)^{*}$ as reference strands, and batching appropriately, and (2) batching via the quantiles of an appropriate ordering of the strands. We also prove asymptotically matching lower bounds on the cost of DNA synthesis, showing that one cannot improve upon these two ideas. Our results uncover a surprising separation between two cases that naturally arise in the context of DNA data storage: the asymptotic cost savings of batch optimization are significantly greater in the case where strings in $\mathcal{S}$ do not contain repeats of the same character (homopolymers), as compared to the case where strings in $\mathcal{S}$ are unconstrained.

preprint2020arXiv

Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering

Consider an instance of Euclidean $k$-means or $k$-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of $(1+\varepsilon)$ under a projection onto a random $O(\log(k / \varepsilon) / \varepsilon^2)$-dimensional subspace. Further, the cost of every clustering is preserved within $(1+\varepsilon)$. More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean $k$-clustering with the distances raised to the $p$-th power for any constant $p$. For $k$-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for $k$-medians, it answers a question raised by Kannan.

preprint2015arXiv

A bi-criteria approximation algorithm for $k$ Means

We consider the classical $k$-means clustering problem in the setting bi-criteria approximation, in which an algoithm is allowed to output $βk > k$ clusters, and must produce a clustering with cost at most $α$ times the to the cost of the optimal set of $k$ clusters. We argue that this approach is natural in many settings, for which the exact number of clusters is a priori unknown, or unimportant up to a constant factor. We give new bi-criteria approximation algorithms, based on linear programming and local search, respectively, which attain a guarantee $α(β)$ depending on the number $βk$ of clusters that may be opened. Our gurantee $α(β)$ is always at most $9 + ε$ and improves rapidly with $β$ (for example: $α(2)<2.59$, and $α(3) < 1.4$). Moreover, our algorithms have only polynomial dependence on the dimension of the input data, and so are applicable in high-dimensional settings.

preprint2015arXiv

Correlation Clustering with Noisy Partial Information

In this paper, we propose and study a semi-random model for the Correlation Clustering problem on arbitrary graphs G. We give two approximation algorithms for Correlation Clustering instances from this model. The first algorithm finds a solution of value $(1+ δ) optcost + O_δ(n\log^3 n)$ with high probability, where $optcost$ is the value of the optimal solution (for every $δ> 0$). The second algorithm finds the ground truth clustering with an arbitrarily small classification error $η$ (under some additional assumptions on the instance).

preprint2015arXiv

Near Optimal LP Rounding Algorithm for Correlation Clustering on Complete and Complete k-partite Graphs

We give new rounding schemes for the standard linear programming relaxation of the correlation clustering problem, achieving approximation factors almost matching the integrality gaps: - For complete graphs our appoximation is $2.06 - \varepsilon$ for a fixed constant $\varepsilon$, which almost matches the previously known integrality gap of $2$. - For complete $k$-partite graphs our approximation is $3$. We also show a matching integrality gap. - For complete graphs with edge weights satisfying triangle inequalities and probability constraints, our approximation is $1.5$, and we show an integrality gap of $1.2$. Our results improve a long line of work on approximation algorithms for correlation clustering in complete graphs, previously culminating in a ratio of $2.5$ for the complete case by Ailon, Charikar and Newman (JACM&#39;08). In the weighted complete case satisfying triangle inequalities and probability constraints, the same authors give a $2$-approximation; for the bipartite case, Ailon, Avigdor-Elgrabli, Liberty and van Zuylen give a $4$-approximation (SICOMP&#39;12).

preprint2015arXiv

Optimization Problems with Diseconomies of Scale via Decoupling

We present a new framework for solving optimization problems with a diseconomy of scale. In such problems, our goal is to minimize the cost of resources used to perform a certain task. The cost of resources grows superlinearly, as $x^q$, $q\ge 1$, with the amount $x$ of resources used. We define a novel linear programming relaxation for such problems, and then show that the integrality gap of the relaxation is $A_q$, where $A_q$ is the $q$-th moment of the Poisson random variable with parameter 1. Using our framework, we obtain approximation algorithms for the Minimum Energy Efficient Routing, Minimum Degree Balanced Spanning Tree, Load Balancing on Unrelated Parallel Machines, and Unrelated Parallel Machine Scheduling with Nonlinear Functions of Completion Times problems. Our analysis relies on the decoupling inequality for nonnegative random variables. The inequality states that $$\big \|\sum_{i=1}^n X_i\big\|_{q} \leq C_q \,\big \|\sum_{i=1}^n Y_i\big\|_{q},$$ where $X_i$ are independent nonnegative random variables, $Y_i$ are possibly dependent nonnegative random variable, and each $Y_i$ has the same distribution as $X_i$. The inequality was proved by de la Peña in 1990. De la Peña, Ibragimov, and Sharakhmetov (2003) showed that $C_q\leq 2$ for $q\in (1,2)$ and $C_q\leq A_q^{1/q}$ for $q\geq 2$. We show that the optimal constant is $C_q=A_q^{1/q}$ for any $q\geq 1$. We then prove a more general inequality: For every convex function $φ$, $$\mathbb{E}[φ\Big(\sum_{i=1}^n X_i\Big)]\leq \mathbb{E}[φ\Big(P\sum_{i=1}^n Y_i\Big)],$$ and, for every concave function $ψ$, $$\mathbb{E}[ψ\Big(\sum_{i=1}^n X_i\Big)] \geq \mathbb{E}[ψ\Big(P\sum_{i=1}^n Y_i\Big)],$$ where $P$ is a Poisson random variable with parameter 1 independent of the random variables $Y_i$.

preprint2015arXiv

Satisfiability of Ordering CSPs Above Average

We study the satisfiability of ordering constraint satisfaction problems (CSPs) above average. We prove the conjecture of Gutin, van Iersel, Mnich, and Yeo that the satisfiability above average of ordering CSPs of arity $k$ is fixed-parameter tractable for every $k$. Previously, this was only known for $k=2$ and $k=3$. We also generalize this result to more general classes of CSPs, including CSPs with predicates defined by linear inequalities. To obtain our results, we prove a new Bonami-type inequality for the Efron-Stein decomposition. The inequality applies to functions defined on arbitrary product probability spaces. In contrast to other variants of the Bonami Inequality, it does not depend on the mass of the smallest atom in the probability space. We believe that this inequality is of independent interest.

preprint2014arXiv

Constant Factor Approximation for Balanced Cut in the PIE model

We propose and study a new semi-random semi-adversarial model for Balanced Cut, a planted model with permutation-invariant random edges (PIE). Our model is much more general than planted models considered previously. Consider a set of vertices V partitioned into two clusters $L$ and $R$ of equal size. Let $G$ be an arbitrary graph on $V$ with no edges between $L$ and $R$. Let $E_{random}$ be a set of edges sampled from an arbitrary permutation-invariant distribution (a distribution that is invariant under permutation of vertices in $L$ and in $R$). Then we say that $G + E_{random}$ is a graph with permutation-invariant random edges. We present an approximation algorithm for the Balanced Cut problem that finds a balanced cut of cost $O(|E_{random}|) + n \text{polylog}(n)$ in this model. In the regime when $|E_{random}| = Ω(n \text{polylog}(n))$, this is a constant factor approximation with respect to the cost of the planted cut.

preprint2014arXiv

Maximum Quadratic Assignment Problem: Reduction from Maximum Label Cover and LP-based Approximation Algorithm

We show that for every positive $ε> 0$, unless NP $\subset$ BPQP, it is impossible to approximate the maximum quadratic assignment problem within a factor better than $2^{\log^{1-ε} n}$ by a reduction from the maximum label cover problem. Our result also implies that Approximate Graph Isomorphism is not robust and is in fact, $1 - ε$ vs $ε$ hard assuming the Unique Games Conjecture. Then, we present an $O(\sqrt{n})$-approximation algorithm for the problem based on rounding of the linear programming relaxation often used in the state of the art exact algorithms.

preprint2014arXiv

Nonuniform Graph Partitioning with Unrelated Weights

We give a bi-criteria approximation algorithm for the Minimum Nonuniform Partitioning problem, recently introduced by Krauthgamer, Naor, Schwartz and Talwar (2014). In this problem, we are given a graph $G=(V,E)$ on $n$ vertices and $k$ numbers $ρ_1,\dots, ρ_k$. The goal is to partition the graph into $k$ disjoint sets $P_1,\dots, P_k$ satisfying $|P_i|\leq ρ_i n$ so as to minimize the number of edges cut by the partition. Our algorithm has an approximation ratio of $O(\sqrt{\log n \log k})$ for general graphs, and an $O(1)$ approximation for graphs with excluded minors. This is an improvement upon the $O(\log n)$ algorithm of Krauthgamer, Naor, Schwartz and Talwar (2014). Our approximation ratio matches the best known ratio for the Minimum (Uniform) $k$-Partitioning problem. We extend our results to the case of &#34;unrelated weights&#34; and to the case of &#34;unrelated $d$-dimensional weights&#34;. In the former case, different vertices may have different weights and the weight of a vertex may depend on the set $P_i$ the vertex is assigned to. In the latter case, each vertex $u$ has a $d$-dimensional weight $r(u,i) = (r_1(u,i), \dots, r_d(u,i))$ if $u$ is assigned to $P_i$. Each set $P_i$ has a $d$-dimensional capacity $c(i) = (c_1(i),\dots, c_d(i))$. The goal is to find a partition such that $\sum_{u\in {P_i}} r(u,i) \leq c(i)$ coordinate-wise.

preprint2014arXiv

Online Algorithms for Machine Minimization

In this paper, we consider the online version of the machine minimization problem (introduced by Chuzhoy et al., FOCS 2004), where the goal is to schedule a set of jobs with release times, deadlines, and processing lengths on a minimum number of identical machines. Since the online problem has strong lower bounds if all the job parameters are arbitrary, we focus on jobs with uniform length. Our main result is a complete resolution of the deterministic complexity of this problem by showing that a competitive ratio of $e$ is achievable and optimal, thereby improving upon existing lower and upper bounds of 2.09 and 5.2 respectively. We also give a constant-competitive online algorithm for the case of uniform deadlines (but arbitrary job lengths); to the best of our knowledge, no such algorithm was known previously. Finally, we consider the complimentary problem of throughput maximization where the goal is to maximize the sum of weights of scheduled jobs on a fixed set of identical machines (introduced by Bar-Noy et al. STOC 1999). We give a randomized online algorithm for this problem with a competitive ratio of e/e-1; previous results achieved this bound only for the case of a single machine or in the limit of an infinite number of machines.

preprint2014arXiv

Precedence-constrained Scheduling of Malleable Jobs with Preemption

Scheduling jobs with precedence constraints on a set of identical machines to minimize the total processing time (makespan) is a fundamental problem in combinatorial optimization. In practical settings such as cloud computing, jobs are often malleable, i.e., can be processed on multiple machines simultaneously. The instantaneous processing rate of a job is a non-decreasing function of the number of machines assigned to it (we call it the processing function). Previous research has focused on practically relevant concave processing functions, which obey the law of diminishing utility and generalize the classical (non-malleable) problem. Our main result is a $(2+ε)$-approximation algorithm for concave processing functions (for any $ε> 0$), which is the best possible under complexity theoretic assumptions. The approximation ratio improves to $(1 + ε)$ for the interesting and practically relevant special case of power functions, i.e., $p_j(z) = c_j \cdot z^γ$.

preprint2013arXiv

Approximation Algorithm for Sparsest k-Partitioning

Given a graph $G$, the sparsest-cut problem asks to find the set of vertices $S$ which has the least expansion defined as $$ϕ_G(S) := \frac{w(E(S,\bar{S}))}{\min \set{w(S), w(\bar{S})}}, $$ where $w$ is the total edge weight of a subset. Here we study the natural generalization of this problem: given an integer $k$, compute a $k$-partition $\set{P_1, \ldots, P_k}$ of the vertex set so as to minimize $$ ϕ_k(\set{P_1, \ldots, P_k}) := \max_i ϕ_G(P_i). $$ Our main result is a polynomial time bi-criteria approximation algorithm which outputs a $(1 - \e)k$-partition of the vertex set such that each piece has expansion at most $O_{\varepsilon}(\sqrt{\log n \log k})$ times $OPT$. We also study balanced versions of this problem.

preprint2013arXiv

Bilu-Linial Stable Instances of Max Cut and Minimum Multiway Cut

We investigate the notion of stability proposed by Bilu and Linial. We obtain an exact polynomial-time algorithm for $γ$-stable Max Cut instances with $γ\geq c\sqrt{\log n}\log\log n$ for some absolute constant $c > 0$. Our algorithm is robust: it never returns an incorrect answer; if the instance is $γ$-stable, it finds the maximum cut, otherwise, it either finds the maximum cut or certifies that the instance is not $γ$-stable. We prove that there is no robust polynomial-time algorithm for $γ$-stable instances of Max Cut when $γ< α_{SC}(n/2)$, where $α_{SC}$ is the best approximation factor for Sparsest Cut with non-uniform demands. Our algorithm is based on semidefinite programming. We show that the standard SDP relaxation for Max Cut (with $\ell_2^2$ triangle inequalities) is integral if $γ\geq D_{\ell_2^2\to \ell_1}(n)$, where $D_{\ell_2^2\to \ell_1}(n)$ is the least distortion with which every $n$ point metric space of negative type embeds into $\ell_1$. On the negative side, we show that the SDP relaxation is not integral when $γ< D_{\ell_2^2\to \ell_1}(n/2)$. Moreover, there is no tractable convex relaxation for $γ$-stable instances of Max Cut when $γ< α_{SC}(n/2)$. That suggests that solving $γ$-stable instances with $γ=o(\sqrt{\log n})$ might be difficult or impossible. Our results significantly improve previously known results. The best previously known algorithm for $γ$-stable instances of Max Cut required that $γ\geq c\sqrt{n}$ (for some $c > 0$) [Bilu, Daniely, Linial, and Saks]. No hardness results were known for the problem. Additionally, we present an algorithm for 4-stable instances of Minimum Multiway Cut. We also study a relaxed notion of weak stability.

preprint2013arXiv

Local Search is Better than Random Assignment for Bounded Occurrence Ordering k-CSPs

We prove that the Bounded Occurrence Ordering k-CSP Problem is not approximation resistant. We give a very simple local search algorithm that always performs better than the random assignment algorithm. Specifically, the expected value of the solution returned by the algorithm is at least Alg > Avg + a(B,k) (Opt - Avg), where &#34;Opt&#34; is the value of the optimal solution; &#34;Avg&#34; is the expected value of the random solution; and a(B,k)=Omega_k(B^{-(k+O(1))} is a parameter depending only on &#34;k&#34; (the arity of the CSP) and &#34;B&#34; (the maximum number of times each variable is used in constraints). The question whether bounded occurrence ordering k-CSPs are approximation resistant was raised by Guruswami and Zhou (APPROX 2012) who recently showed that bounded occurrence 3-CSPs and &#34;monotone&#34; k-CSPs admit a non-trivial approximation.

preprint2012arXiv

Approximation Algorithm for Non-Boolean MAX k-CSP

In this paper, we present a randomized polynomial-time approximation algorithm for k-CSPd. In k-CSPd, we are given a set of predicates of arity k over an alphabet of size d. Our goal is to find an assignment that maximizes the number of satisfied constraints. Our algorithm has approximation factor Omega(kd/d^k) (when k > Ω(log d)). This bound is asymptotically optimal assuming the Unique Games Conjecture. The best previously known algorithm has approximation factor Omega(k log d/d^k). We also give an approximation algorithm for the boolean MAX k-CSP2 problem with a slightly improved approximation guarantee.

preprint2012arXiv

Approximation Algorithms for Semi-random Graph Partitioning Problems

In this paper, we propose and study a new semi-random model for graph partitioning problems. We believe that it captures many properties of real--world instances. The model is more flexible than the semi-random model of Feige and Kilian and planted random model of Bui, Chaudhuri, Leighton and Sipser. We develop a general framework for solving semi-random instances and apply it to several problems of interest. We present constant factor bi-criteria approximation algorithms for semi-random instances of the Balanced Cut, Multicut, Min Uncut, Sparsest Cut and Small Set Expansion problems. We also show how to almost recover the optimal solution if the instance satisfies an additional expanding condition. Our algorithms work in a wider range of parameters than most algorithms for previously studied random and semi-random models. Additionally, we study a new planted algebraic expander model and develop constant factor bi-criteria approximation algorithms for graph partitioning problems in this model.

preprint2011arXiv

How to Play Unique Games against a Semi-Random Adversary

In this paper, we study the average case complexity of the Unique Games problem. We propose a natural semi-random model, in which a unique game instance is generated in several steps. First an adversary selects a completely satisfiable instance of Unique Games, then she chooses an epsilon-fraction of all edges, and finally replaces (&#34;corrupts&#34;) the constraints corresponding to these edges with new constraints. If all steps are adversarial, the adversary can obtain any (1-epsilon) satisfiable instance, so then the problem is as hard as in the worst case. In our semi-random model, one of the steps is random, and all other steps are adversarial. We show that known algorithms for unique games (in particular, all algorithms that use the standard SDP relaxation) fail to solve semi-random instances of Unique Games. We present an algorithm that with high probability finds a solution satisfying a (1-delta) fraction of all constraints in semi-random instances (we require that the average degree of the graph is Omega(log k). To this end, we consider a new non-standard SDP program for Unique Games, which is not a relaxation for the problem, and show how to analyze it. We present a new rounding scheme that simultaneously uses SDP and LP solutions, which we believe is of independent interest. Our result holds only for epsilon less than some absolute constant. We prove that if epsilon > 1/2, then the problem is hard in one of the models, the result assumes the 2-to-2 conjecture. Finally, we study semi-random instances of Unique Games that are at most (1-epsilon) satisfiable. We present an algorithm that with high probability, distinguishes between the case when the instance is a semi-random instance and the case when the instance is an (arbitrary) (1-delta) satisfiable instance if epsilon > c delta.

preprint2011arXiv

Min-Max Graph Partitioning and Small Set Expansion

We study graph partitioning problems from a min-max perspective, in which an input graph on n vertices should be partitioned into k parts, and the objective is to minimize the maximum number of edges leaving a single part. The two main versions we consider are where the k parts need to be of equal-size, and where they must separate a set of k given terminals. We consider a common generalization of these two problems, and design for it an $O(\sqrt{\log n\log k})$-approximation algorithm. This improves over an $O(\log^2 n)$ approximation for the second version, and roughly $O(k\log n)$ approximation for the first version that follows from other previous work. We also give an improved O(1)-approximation algorithm for graphs that exclude any fixed minor. Our algorithm uses a new procedure for solving the Small-Set Expansion problem. In this problem, we are given a graph G and the goal is to find a non-empty set $S\subseteq V$ of size $|S| \leq ρn$ with minimum edge-expansion. We give an $O(\sqrt{\log{n}\log{(1/ρ)}})$ bicriteria approximation algorithm for the general case of Small-Set Expansion, and O(1) approximation algorithm for graphs that exclude any fixed minor.

preprint2011arXiv

On Parsimonious Explanations for 2-D Tree- and Linearly-Ordered Data

This paper studies the &#34;explanation problem&#34; for tree- and linearly-ordered array data, a problem motivated by database applications and recently solved for the one-dimensional tree-ordered case. In this paper, one is given a matrix A whose rows and columns have semantics: special subsets of the rows and special subsets of the columns are meaningful, others are not. A submatrix in A is said to be meaningful if and only if it is the cross product of a meaningful row subset and a meaningful column subset, in which case we call it an &#34;allowed rectangle.&#34; The goal is to &#34;explain&#34; A as a sparse sum of weighted allowed rectangles. Specifically, we wish to find as few weighted allowed rectangles as possible such that, for all i,j, a_{ij} equals the sum of the weights of all rectangles which include cell (i,j). In this paper we consider the natural cases in which the matrix dimensions are tree-ordered or linearly-ordered. In the tree-ordered case, we are given a rooted tree T1 whose leaves are the rows of A and another, T2, whose leaves are the columns. Nodes of the trees correspond in an obvious way to the sets of their leaf descendants. In the linearly-ordered case, a set of rows or columns is meaningful if and only if it is contiguous. For tree-ordered data, we prove the explanation problem NP-Hard and give a randomized 2-approximation algorithm for it. For linearly-ordered data, we prove the explanation problem NP-Hard and give a 2.56-approximation algorithm. To our knowledge, these are the first results for the problem of sparsely and exactly representing matrices by weighted rectangles.

preprint2010arXiv

Metric Extension Operators, Vertex Sparsifiers and Lipschitz Extendability

We study vertex cut and flow sparsifiers that were recently introduced by Moitra, and Leighton and Moitra. We improve and generalize their results. We give a new polynomial-time algorithm for constructing O(log k / log log k) cut and flow sparsifiers, matching the best existential upper bound on the quality of a sparsifier, and improving the previous algorithmic upper bound of O(log^2 k / log log k). We show that flow sparsifiers can be obtained from linear operators approximating minimum metric extensions. We introduce the notion of (linear) metric extension operators, prove that they exist, and give an exact polynomial-time algorithm for finding optimal operators. We then establish a direct connection between flow and cut sparsifiers and Lipschitz extendability of maps in Banach spaces, a notion studied in functional analysis since 1930s. Using this connection, we prove a lower bound of Omega(sqrt{log k/log log k}) for flow sparsifiers and a lower bound of Omega(sqrt{log k}/log log k) for cut sparsifiers. We show that if a certain open question posed by Ball in 1992 has a positive answer, then there exist \tilde O(sqrt{log k}) cut sparsifiers. On the other hand, any lower bound on cut sparsifiers better than \tilde Omega(sqrt{log k}) would imply a negative answer to this question.