Researcher profile

Michael Saks

Michael Saks contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2021arXiv

Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time

Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer, and Onak (2010) gave a nearly linear time algorithm that approximates edit distance within an approximation factor $\text{poly}(\log n)$. In this paper, we provide an algorithm with running time $\tilde{O}(n^{2-2/7})$ that approximates the edit distance within a constant factor.

preprint2020arXiv

On the rational relationships among pseudo-roots of a non-commutative polynomial

For a non-commutative ring R, we consider factorizations of polynomials in R[t] where t is a central variable. A pseudo-root of a polynomial p(t) is an element x in R, for which there exist polynomials q(t) and s(t) such that p(t)=q(t)(t-x)s(t). We investigate the rational relationships that hold among the pseudo-roots of p(t) by using the diamond operations for cover graphs of modular lattices.

preprint2013arXiv

A Polynomial Time Algorithm for Lossy Population Recovery

We give a polynomial time algorithm for the lossy population recovery problem. In this problem, the goal is to approximately learn an unknown distribution on binary strings of length $n$ from lossy samples: for some parameter $μ$ each coordinate of the sample is preserved with probability $μ$ and otherwise is replaced by a `?'. The running time and number of samples needed for our algorithm is polynomial in $n$ and $1/\varepsilon$ for each fixed $μ>0$. This improves on algorithm of Wigderson and Yehudayoff that runs in quasi-polynomial time for any $μ> 0$ and the polynomial time algorithm of Dvir et al which was shown to work for $μ\gtrapprox 0.30$ by Batman et al. In fact, our algorithm also works in the more general framework of Batman et al. in which there is no a priori bound on the size of the support of the distribution. The algorithm we analyze is implicit in previous work; our main contribution is to analyze the algorithm by showing (via linear programming duality and connections to complex analysis) that a certain matrix associated with the problem has a robust local inverse even though its condition number is exponentially small. A corollary of our result is the first polynomial time algorithm for learning DNFs in the restriction access model of Dvir et al.

preprint2013arXiv

Composition limits and separating examples for some Boolean function complexity measures

Block sensitivity ($bs(f)$), certificate complexity ($C(f)$) and fractional certificate complexity ($C^*(f)$) are three fundamental combinatorial measures of complexity of a boolean function $f$. It has long been known that $bs(f) \leq C^{\ast}(f) \leq C(f) =O(bs(f)^2)$. We provide an infinite family of examples for which $C(f)$ grows quadratically in $C^{\ast}(f)$ (and also $bs(f)$) giving optimal separations between these measures. Previously the biggest separation known was $C(f)=C^{\ast}(f)^{\log_{4.5}5}$. We also give a family of examples for which $C^{\ast}(f)=Ω(bs(f)^{3/2})$. These examples are obtained by composing boolean functions in various ways. Here the composition $f \circ g$ of $f$ with $g$ is obtained by substituting for each variable of $f$ a copy of $g$ on disjoint sets of variables. To construct and analyse these examples we systematically investigate the behaviour under function composition of these measures and also the sensitivity measure $s(f)$. The measures $s(f)$, $C(f)$ and $C^{\ast}(f)$ behave nicely under composition: they are submultiplicative (where measure $m$ is submultiplicative if $m(f \circ g) \leq m(f)m(g)$) with equality holding under some fairly general conditions. The measure $bs(f)$ is qualitatively different: it is not submultiplicative. This qualitative difference was not noticed in the previous literature and we correct some errors that appeared in previous papers. We define the composition limit of a measure $m$ at function $f$, $m^{\lim}(f)$ to be the limit as $k$ grows of $m(f^{(k)})^{1/k}$, where $f^{(k)}$ is the iterated composition of $f$ with itself $k$-times. For any function $f$ we show that $bs^{\lim}(f) = (C^*)^{\lim}(f)$ and characterize $s^{\lim}(f), (C^*)^{\lim}(f)$, and $C^{\lim}(f)$ in terms of the largest eigenvalue of a certain set of $2\times 2$ matrices associated with $f$.

preprint2012arXiv

Clustering is difficult only when it does not matter

Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and interesting question is how difficult it is to cluster data sets {\em that can be clustered well}. More generally, despite the ubiquity and the great importance of clustering, we still do not have a satisfactory mathematical theory of clustering. In order to properly understand clustering, it is clearly necessary to develop a solid theoretical basis for the area. For example, from the perspective of computational complexity theory the clustering problem seems very hard. Numerous papers introduce various criteria and numerical measures to quantify the quality of a given clustering. The resulting conclusions are pessimistic, since it is computationally difficult to find an optimal clustering of a given data set, if we go by any of these popular criteria. In contrast, the practitioners' perspective is much more optimistic. Our explanation for this disparity of opinions is that complexity theory concentrates on the worst case, whereas in reality we only care for data sets that can be clustered well. We introduce a theoretical framework of clustering in metric spaces that revolves around a notion of "good clustering". We show that if a good clustering exists, then in many cases it can be efficiently found. Our conclusion is that contrary to popular belief, clustering should not be considered a hard task.

preprint2012arXiv

On Online Labeling with Polynomially Many Labels

In the online labeling problem with parameters n and m we are presented with a sequence of n keys from a totally ordered universe U and must assign each arriving key a label from the label set {1,2,...,m} so that the order of labels (strictly) respects the ordering on U. As new keys arrive it may be necessary to change the labels of some items; such changes may be done at any time at unit cost for each change. The goal is to minimize the total cost. An alternative formulation of this problem is the file maintenance problem, in which the items, instead of being labeled, are maintained in sorted order in an array of length m, and we pay unit cost for moving an item. For the case m=cn for constant c>1, there are known algorithms that use at most O(n log(n)^2) relabelings in total [Itai, Konheim, Rodeh, 1981], and it was shown recently that this is asymptotically optimal [Bulánek, Koucký, Saks, 2012]. For the case of m=Θ(n^C) for C>1, algorithms are known that use O(n log n) relabelings. A matching lower bound was claimed in [Dietz, Seiferas, Zhang, 2004]. That proof involved two distinct steps: a lower bound for a problem they call prefix bucketing and a reduction from prefix bucketing to online labeling. The reduction seems to be incorrect, leaving a (seemingly significant) gap in the proof. In this paper we close the gap by presenting a correct reduction to prefix bucketing. Furthermore we give a simplified and improved analysis of the prefix bucketing lower bound. This improvement allows us to extend the lower bounds for online labeling to the case where the number m of labels is superpolynomial in n. In particular, for superpolynomial m we get an asymptotically optimal lower bound Ω((n log n) / (log log m - log log n)).

preprint2012arXiv

On the practically interesting instances of MAXCUT

The complexity of a computational problem is traditionally quantified based on the hardness of its worst case. This approach has many advantages and has led to a deep and beautiful theory. However, from the practical perspective, this leaves much to be desired. In application areas, practically interesting instances very often occupy just a tiny part of an algorithm's space of instances, and the vast majority of instances are simply irrelevant. Addressing these issues is a major challenge for theoretical computer science which may make theory more relevant to the practice of computer science. Following Bilu and Linial, we apply this perspective to MAXCUT, viewed as a clustering problem. Using a variety of techniques, we investigate practically interesting instances of this problem. Specifically, we show how to solve in polynomial time distinguished, metric, expanding and dense instances of MAXCUT under mild stability assumptions. In particular, $(1+ε)$-stability (which is optimal) suffices for metric and dense MAXCUT. We also show how to solve in polynomial time $Ω(\sqrt{n})$-stable instances of MAXCUT, substantially improving the best previously known result.