Source author record

Yuxin Sun

Yuxin Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Data Structures and Algorithms math.ST Statistics Theory math.PR

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Revisiting Agnostic Boosting

Boosting is a key method in statistical learning, allowing for converting weak learners into strong ones. While well studied in the realizable case, the statistical properties of weak-to-strong learning remain less understood in the agnostic setting, where there are no assumptions on the distribution of the labels. In this work, we propose a new agnostic boosting algorithm with substantially improved sample complexity compared to prior works under very general assumptions. Our approach is based on a reduction to the realizable case, followed by a margin-based filtering of high-quality hypotheses. Furthermore, we show a nearly-matching lower bound, settling the sample complexity of agnostic boosting up to logarithmic factors.

preprint2022arXiv

Optimal SQ Lower Bounds for Robustly Learning Discrete Product Distributions and Ising Models

We establish optimal Statistical Query (SQ) lower bounds for robustly learning certain families of discrete high-dimensional distributions. In particular, we show that no efficient SQ algorithm with access to an $ε$-corrupted binary product distribution can learn its mean within $\ell_2$-error $o(ε\sqrt{\log(1/ε)})$. Similarly, we show that no efficient SQ algorithm with access to an $ε$-corrupted ferromagnetic high-temperature Ising model can learn the model to total variation distance $o(ε\log(1/ε))$. Our SQ lower bounds match the error guarantees of known algorithms for these problems, providing evidence that current upper bounds for these tasks are best possible. At the technical level, we develop a generic SQ lower bound for discrete high-dimensional distributions starting from low dimensional moment matching constructions that we believe will find other applications. Additionally, we introduce new ideas to analyze these moment-matching constructions for discrete univariate distributions.

preprint2022arXiv

What can be sampled locally?

The local computation of Linial [FOCS'87] and Naor and Stockmeyer [STOC'93] concerns with the question of whether a locally definable distributed computing problem can be solved locally: for a given local CSP whether a CSP solution can be constructed by a distributed algorithm using local information. In this paper, we consider the problem of sampling a uniform CSP solution by distributed algorithms, and ask whether a locally definable joint distribution can be sampled from locally. More broadly, we consider sampling from Gibbs distributions induced by weighted local CSPs in the LOCAL model. We give two Markov chain based distributed algorithms which we believe to represent two fundamental approaches for sampling from Gibbs distributions via distributed algorithms. The first algorithm generically parallelizes the single-site sequential Markov chain by iteratively updating a random independent set of variables in parallel, and achieves an $O(Δ\log n)$ time upper bound in the LOCAL model, where $Δ$ is the maximum degree, when the Dobrushin's condition for the Gibbs distribution is satisfied. The second algorithm is a novel parallel Markov chain which proposes to update all variables simultaneously yet still guarantees to converge correctly with no bias. It surprisingly parallelizes an intrinsically sequential process: stabilizing to a joint distribution with massive local dependencies, and may achieve an optimal $O(\log n)$ time upper bound independent of the maximum degree $Δ$ under a stronger mixing condition. We also show a strong $Ω(diam)$ lower bound for sampling independent set in graphs with maximum degree $Δ\ge 6$. This lower bound holds even when every node is aware of the graph. This gives a strong separation between sampling and constructing locally checkable labelings.

preprint2021arXiv

Outlier-Robust Learning of Ising Models Under Dobrushin's Condition

We study the problem of learning Ising models satisfying Dobrushin's condition in the outlier-robust setting where a constant fraction of the samples are adversarially corrupted. Our main result is to provide the first computationally efficient robust learning algorithm for this problem with near-optimal error guarantees. Our algorithm can be seen as a special case of an algorithm for robustly learning a distribution from a general exponential family. To prove its correctness for Ising models, we establish new anti-concentration results for degree-$2$ polynomials of Ising models that may be of independent interest.

preprint2020arXiv

Correlated Feature Selection with Extended Exclusive Group Lasso

In many high dimensional classification or regression problems set in a biological context, the complete identification of the set of informative features is often as important as predictive accuracy, since this can provide mechanistic insight and conceptual understanding. Lasso and related algorithms have been widely used since their sparse solutions naturally identify a set of informative features. However, Lasso performs erratically when features are correlated. This limits the use of such algorithms in biological problems, where features such as genes often work together in pathways, leading to sets of highly correlated features. In this paper, we examine the performance of a Lasso derivative, the exclusive group Lasso, in this setting. We propose fast algorithms to solve the exclusive group Lasso, and introduce a solution to the case when the underlying group structure is unknown. The solution combines stability selection with random group allocation and introduction of artificial features. Experiments with both synthetic and real-world data highlight the advantages of this proposed methodology over Lasso in comprehensive selection of informative features.