Source author record

Themis Gouleakis

Themis Gouleakis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning math.ST Statistics Theory Distributed, Parallel, and Cluster Computing Computational Complexity Information Theory math.IT math.PR

Catalog footprint

What is connected

8works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Mean Testing under Truncation beyond Gaussian

We characterize the fundamental limits of high-dimensional mean testing under arbitrary truncation, where samples are drawn from the conditional distribution $P(\cdot \mid S)$ for an unknown truncation set $S$ that may hide up to an $\varepsilon$-fraction of the probability mass. For distributions with $p$-th directional moments of magnitude at most $ν_{P,p}$, truncation induces a bias of order $O(ν_{P,p}\varepsilon^{1-1/p})$. This bias creates a sharp information-theoretic detectability floor: when the signal $α$ falls below this threshold, the null and alternative hypotheses are indistinguishable even with infinite data. Above this floor, we prove that a simple second-order test achieving near-optimal sample complexity $n = O\!\left(\frac{\|Σ_P\|}{(α-4ν_{P,p}\varepsilon^{1-1/p})^2}\sqrt{d}\right)$. We further identify a structural escape from this finite-moment bias barrier. Under a directional median regularity assumption, truncation bias improves to linear order $O(\varepsilon)$. This reveals an intermediate regime in which estimation requires $Θ(d)$ samples for uniform recovery, while testing recovers the classical $Θ(\sqrt d)$ rate once truncation bias is eliminated. Together, our results provide a unified framework for mean testing under truncation, connecting finite-moment, sub-Gaussian, and median-regular structural regimes.

preprint2022arXiv

Almost Universally Optimal Distributed Laplacian Solvers via Low-Congestion Shortcuts

In this paper, we refine the (almost) \emph{existentially optimal} distributed Laplacian solver recently developed by Forster, Goranci, Liu, Peng, Sun, and Ye (FOCS `21) into an (almost) \emph{universally optimal} distributed Laplacian solver. Specifically, when the topology is known, we show that any Laplacian system on an $n$-node graph with \emph{shortcut quality} $\text{SQ}(G)$ can be solved within $n^{o(1)} \text{SQ}(G) \log(1/\varepsilon)$ rounds, where $\varepsilon$ is the required accuracy. This almost matches our lower bound which guarantees that any correct algorithm on $G$ requires $\widetildeΩ(\text{SQ}(G))$ rounds, even for a crude solution with $\varepsilon \le 1/2$. Even in the unknown-topology case (i.e., standard CONGEST), the same bounds also hold in most networks of interest. Furthermore, conditional on conjectured improvements in state-of-the-art constructions of low-congestion shortcuts, the CONGEST results will match the known-topology ones. Moreover, following a recent line of work in distributed algorithms, we consider a hybrid communication model which enhances CONGEST with limited global power in the form of the node-capacitated clique (NCC) model. In this model, we show the existence of a Laplacian solver with round complexity $n^{o(1)} \log(1/\varepsilon)$. The unifying thread of these results, and our main technical contribution, is the study of novel \emph{congested} generalization of the standard \emph{part-wise aggregation} problem. We develop near-optimal algorithms for this primitive in the Supported-CONGEST model, almost-optimal algorithms in (standard) CONGEST, as well as a very simple algorithm for bounded-treewidth graphs with slightly worse bounds. This primitive can be readily used to accelerate the FOCS`21 Laplacian solver. We believe this primitive will find further independent applications.

preprint2022arXiv

Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover

We present $O(\log\log n)$-round algorithms in the Massively Parallel Computation (MPC) model, with $\tilde{O}(n)$ memory per machine, that compute a maximal independent set, a $1+ε$ approximation of maximum matching, and a $2+ε$ approximation of minimum vertex cover, for any $n$-vertex graph and any constant $ε>0$. These improve the state of the art as follows: - Our MIS algorithm leads to a simple $O(\log\log Δ)$-round MIS algorithm in the Congested Clique model of distributed computing, which improves on the $\tilde{O}(\sqrt{\log Δ})$-round algorithm of Ghaffari [PODC'17]. - Our $O(\log\log n)$-round $(1+ε)$-approximate maximum matching algorithm simplifies or improves on the following prior work: $O(\log^2\log n)$-round $(1+ε)$-approximation algorithm of Czumaj et al. [STOC'18] and $O(\log\log n)$-round $(1+ε)$-approximation algorithm of Assadi et al. [SODA'19]. - Our $O(\log\log n)$-round $(2+ε)$-approximate minimum vertex cover algorithm improves on an $O(\log\log n)$-round $O(1)$-approximation of Assadi et al. [arXiv'17].

preprint2022arXiv

Learning-Augmented Algorithms for Online TSP on the Line

We study the online Traveling Salesman Problem (TSP) on the line augmented with machine-learned predictions. In the classical problem, there is a stream of requests released over time along the real line. The goal is to minimize the makespan of the algorithm. We distinguish between the open variant and the closed one, in which we additionally require the algorithm to return to the origin after serving all requests. The state of the art is a $1.64$-competitive algorithm and a $2.04$-competitive algorithm for the closed and open variants, respectively \cite{Bjelde:1.64}. In both cases, a tight lower bound is known \cite{Ausiello:1.75, Bjelde:1.64}. In both variants, our primary prediction model involves predicted positions of the requests. We introduce algorithms that (i) obtain a tight 1.5 competitive ratio for the closed variant and a 1.66 competitive ratio for the open variant in the case of perfect predictions, (ii) are robust against unbounded prediction error, and (iii) are smooth, i.e., their performance degrades gracefully as the prediction error increases. Moreover, we further investigate the learning-augmented setting in the open variant by additionally considering a prediction for the last request served by the optimal offline algorithm. Our algorithm for this enhanced setting obtains a 1.33 competitive ratio with perfect predictions while also being smooth and robust, beating the lower bound of 1.44 we show for our original prediction setting for the open variant. Also, we provide a lower bound of 1.25 for this enhanced setting.

preprint2020arXiv

Optimal Testing of Discrete Distributions with High Probability

We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property $\mathcal{P}$, and parameters $0< ε, δ<1$, we want to distinguish {\em with probability at least $1-δ$} whether these distributions satisfy $\mathcal{P}$ or are $ε$-far from $\mathcal{P}$ in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to $δ= Ω(1)$), and provided sample-optimal testers for a range of properties. While one can always boost the confidence probability of any such tester by black-box amplification, this generic boosting method typically leads to sub-optimal sample bounds. Here we study the following broad question: For a given property $\mathcal{P}$, can we {\em characterize} the sample complexity of testing $\mathcal{P}$ as a function of all relevant problem parameters, including the error probability $δ$? Prior to this work, uniformity testing was the only statistical task whose sample complexity had been characterized in this setting. As our main results, we provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters. We also show matching information-theoretic lower bounds on the sample complexity of these problems. Our techniques naturally extend to give optimal testers for related problems. To illustrate the generality of our methods, we give optimal algorithms for testing collections of distributions and testing closeness with unequal sized samples.

preprint2020arXiv

Simple Local Computation Algorithms for the General Lovasz Local Lemma

We consider the task of designing Local Computation Algorithms (LCA) for applications of the Lovász Local Lemma (LLL). LCA is a class of sublinear algorithms proposed by Rubinfeld et al.~\cite{Ronitt} that have received a lot of attention in recent years. The LLL is an existential, sufficient condition for a collection of sets to have non-empty intersection (in applications, often, each set comprises all objects having a certain property). The ground-breaking algorithm of Moser and Tardos~\cite{MT} made the LLL fully constructive, following earlier results by Beck~\cite{beck_lll} and Alon~\cite{alon_lll} giving algorithms under significantly stronger LLL-like conditions. LCAs under those stronger conditions were given in~\cite{Ronitt}, where it was asked if the Moser-Tardos algorithm can be used to design LCAs under the standard LLL condition. The main contribution of this paper is to answer this question affirmatively. In fact, our techniques yield LCAs for settings beyond the standard LLL condition.

preprint2016arXiv

Collision-based Testers are Optimal for Uniformity and Closeness

We study the fundamental problems of (i) uniformity testing of a discrete distribution, and (ii) closeness testing between two discrete distributions with bounded $\ell_2$-norm. These problems have been extensively studied in distribution testing and sample-optimal estimators are known for them~\cite{Paninski:08, CDVV14, VV14, DKN:15}. In this work, we show that the original collision-based testers proposed for these problems ~\cite{GRdist:00, BFR+:00} are sample-optimal, up to constant factors. Previous analyses showed sample complexity upper bounds for these testers that are optimal as a function of the domain size $n$, but suboptimal by polynomial factors in the error parameter $ε$. Our main contribution is a new tight analysis establishing that these collision-based testers are information-theoretically optimal, up to constant factors, both in the dependence on $n$ and in the dependence on $ε$.

preprint2016arXiv

Testing Shape Restrictions of Discrete Distributions

We study the question of testing structured properties (classes) of discrete distributions. Specifically, given sample access to an arbitrary distribution $D$ over $[n]$ and a property $\mathcal{P}$, the goal is to distinguish between $D\in\mathcal{P}$ and $\ell_1(D,\mathcal{P})>\varepsilon$. We develop a general algorithm for this question, which applies to a large range of "shape-constrained" properties, including monotone, log-concave, $t$-modal, piecewise-polynomial, and Poisson Binomial distributions. Moreover, for all cases considered, our algorithm has near-optimal sample complexity with regard to the domain size and is computationally efficient. For most of these classes, we provide the first non-trivial tester in the literature. In addition, we also describe a generic method to prove lower bounds for this problem, and use it to show our upper bounds are nearly tight. Finally, we extend some of our techniques to tolerant testing, deriving nearly-tight upper and lower bounds for the corresponding questions.

Themis Gouleakis

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Mean Testing under Truncation beyond Gaussian

Almost Universally Optimal Distributed Laplacian Solvers via Low-Congestion Shortcuts

Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover

Learning-Augmented Algorithms for Online TSP on the Line

Optimal Testing of Discrete Distributions with High Probability

Simple Local Computation Algorithms for the General Lovasz Local Lemma

Collision-based Testers are Optimal for Uniformity and Closeness

Testing Shape Restrictions of Discrete Distributions