Source author record

Ya-Ping Hsieh

Ya-Ping Hsieh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Applications cond-mat.mtrl-sci Data Structures and Algorithms Information Theory math.IT math.ST Molecular Networks physics.optics Statistics Theory

Catalog footprint

What is connected

10works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Support Before Frequency in Discrete Diffusion

Discrete diffusion models are increasingly competitive for language modeling, yet it remains unclear how their denoising objectives organize learning. Although these objectives target the full data distribution, we show that the exact reverse process induces a hierarchy between coarse support information and finer frequency information. For uniform and absorbing (a.k.a. masking) diffusion, we prove that, in the small-noise regime of the final denoising steps, each single-token reverse edit decomposes into a leading scale, determined by whether it moves toward the data support (e.g., grammatically valid sentences), and a finer coefficient, determining relative probabilities within the same scale. Thus, recovering validity structure only requires learning the correct order of magnitude of reverse probabilities, whereas recovering data frequencies requires coefficient-level estimation. The separation is mechanism-dependent: uniform diffusion exhibits a trichotomy into validity-improving, validity-preserving, and validity-worsening edits, while absorbing diffusion places its leading-order mass on validity-improving moves. Experiments on a masked language diffusion model and synthetic regular-language tasks support these predictions: support-localization emerges earlier than within-support frequency ranking, and the contrast between uniform and absorbing diffusion matches the predicted rate separation. Together, our results suggest that discrete diffusion models learn data support before data frequencies.

preprint2022arXiv

Continuous-time Analysis for Variational Inequalities: An Overview and Desiderata

Algorithms that solve zero-sum games, multi-objective agent objectives, or, more generally, variational inequality (VI) problems are notoriously unstable on general problems. Owing to the increasing need for solving such problems in machine learning, this instability has been highlighted in recent years as a significant research challenge. In this paper, we provide an overview of recent progress in the use of continuous-time perspectives in the analysis and design of methods targeting the broad VI problem class. Our presentation draws parallels between single-objective problems and multi-objective problems, highlighting the challenges of the latter. We also formulate various desiderata for algorithms that apply to general VIs and we argue that achieving these desiderata may profit from an understanding of the associated continuous-time dynamics.

preprint2021arXiv

The limits of min-max optimization algorithms: convergence to spurious non-critical sets

Compared to ordinary function minimization problems, min-max optimization algorithms encounter far greater challenges because of the existence of periodic cycles and similar phenomena. Even though some of these behaviors can be overcome in the convex-concave regime, the general case is considerably more difficult. On that account, we take an in-depth look at a comprehensive class of state-of-the art algorithms and prevalent heuristics in non-convex / non-concave problems, and we establish the following general results: a) generically, the algorithms' limit points are contained in the ICT sets of a common, mean-field system; b) the attractors of this system also attract the algorithms in question with arbitrarily high probability; and c) all algorithms avoid the system's unstable sets with probability 1. On the surface, this provides a highly optimistic outlook for min-max algorithms; however, we show that there exist spurious attractors that do not contain any stationary points of the problem under study. In this regard, our work suggests that existing min-max algorithms may be subject to inescapable convergence failures. We complement our theoretical analysis by illustrating such attractors in simple, two-dimensional, almost bilinear problems.

preprint2020arXiv

Conditional gradient methods for stochastically constrained convex minimization

We propose two novel conditional gradient-based methods for solving structured stochastic convex optimization problems with a large number of linear constraints. Instances of this template naturally arise from SDP-relaxations of combinatorial problems, which involve a number of constraints that is polynomial in the problem dimension. The most important feature of our framework is that only a subset of the constraints is processed at each iteration, thus gaining a computational advantage over prior works that require full passes. Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees. Preliminary numerical experiments are provided for illustrating the practical performance of the methods.

preprint2020arXiv

Mirrored Langevin Dynamics

We consider the problem of sampling from constrained distributions, which has posed significant challenges to both non-asymptotic analysis and algorithmic design. We propose a unified framework, which is inspired by the classical mirror descent, to derive novel first-order sampling schemes. We prove that, for a general target distribution with strongly convex potential, our framework implies the existence of a first-order algorithm achieving $\tilde{O}(ε^{-2}d)$ convergence, suggesting that the state-of-the-art $\tilde{O}(ε^{-6}d^5)$ can be vastly improved. With the important Latent Dirichlet Allocation (LDA) application in mind, we specialize our algorithm to sample from Dirichlet posteriors, and derive the first non-asymptotic $\tilde{O}(ε^{-2}d^2)$ rate for first-order sampling. We further extend our framework to the mini-batch setting and prove convergence rates when only stochastic gradients are available. Finally, we report promising experimental results for LDA on real datasets.

preprint2016arXiv

An Efficient Streaming Algorithm for the Submodular Cover Problem

We initiate the study of the classical Submodular Cover (SC) problem in the data streaming model which we refer to as the Streaming Submodular Cover (SSC). We show that any single pass streaming algorithm using sublinear memory in the size of the stream will fail to provide any non-trivial approximation guarantees for SSC. Hence, we consider a relaxed version of SSC, where we only seek to find a partial cover. We design the first Efficient bicriteria Submodular Cover Streaming (ESC-Streaming) algorithm for this problem, and provide theoretical guarantees for its performance supported by numerical evidence. Our algorithm finds solutions that are competitive with the near-optimal offline greedy algorithm despite requiring only a single pass over the data stream. In our numerical experiments, we evaluate the performance of ESC-Streaming on active set selection and large-scale graph cover problems.

preprint2016arXiv

Frank-Wolfe Works for Non-Lipschitz Continuous Gradient Objectives: Scalable Poisson Phase Retrieval

We study a phase retrieval problem in the Poisson noise model. Motivated by the PhaseLift approach, we approximate the maximum-likelihood estimator by solving a convex program with a nuclear norm constraint. While the Frank-Wolfe algorithm, together with the Lanczos method, can efficiently deal with nuclear norm constraints, our objective function does not have a Lipschitz continuous gradient, and hence existing convergence guarantees for the Frank-Wolfe algorithm do not apply. In this paper, we show that the Frank-Wolfe algorithm works for the Poisson phase retrieval problem, and has a global convergence rate of O(1/t), where t is the iteration counter. We provide rigorous theoretical guarantee and illustrating numerical results.

preprint2016arXiv

Nanostructure analysis of InGaN/GaN quantum wells based on semi-polar-faced GaN nanorods

We demonstrate a series of InGaN/GaN double quantum well nanostructure elements. We grow a layer of 2 μm undoped GaN template on top of a (0001)-direction sapphire substrate. A 100 nm SiO2 thin film is deposited on top as a masking pattern layer. This layer is then covered with a 300 nm aluminum layer as the anodic aluminum oxide (AAO) hole pattern layer. After oxalic acid etching, we transfer the hole pattern from the AAO layer to the SiO2 layer by reactive ion etching. Lastly, we utilize metal-organic chemical vapor deposition to grow GaN nanorods approximately 1.5 μm in size. We then grow two layers of InGaN/GaN double quantum wells on the semi-polar face of the GaN nanorod substrate under different temperatures. We then study the characteristics of the InGaN/GaN quantum wells formed on the semi-polar faces of GaN nanorods. We report the following findings from our study: first, using SiO2 with repeating hole pattern, we are able to grow high-quality GaN nanorods with diameters of approximately 80-120 nm; second, photoluminescence (PL) measurements enable us to identify Fabry-Perot effect from InGaN/GaN quantum wells on the semi-polar face. We calculate the quantum wells' cavity thickness with obtained PL measurements. Lastly, high resolution TEM images allow us to study the lattice structure characteristics of InGaN/GaN quantum wells on GaN nanorod and identify the existence of threading dislocations in the lattice structure that affects the GaN nanorod's growth mechanism.

preprint2015arXiv

A Geometric View on Constrained M-Estimators

We study the estimation error of constrained M-estimators, and derive explicit upper bounds on the expected estimation error determined by the Gaussian width of the constraint set. Both of the cases where the true parameter is on the boundary of the constraint set (matched constraint), and where the true parameter is strictly in the constraint set (mismatched constraint) are considered. For both cases, we derive novel universal estimation error bounds for regression in a generalized linear model with the canonical link function. Our error bound for the mismatched constraint case is minimax optimal in terms of its dependence on the sample size, for Gaussian linear regression by the Lasso.

preprint2013arXiv

Mathematical Foundations for Information Theory in Diffusion-Based Molecular Communications

Molecular communication emerges as a promising communication paradigm for nanotechnology. However, solid mathematical foundations for information-theoretic analysis of molecular communication have not yet been built. In particular, no one has ever proven that the channel coding theorem applies for molecular communication, and no relationship between information rate capacity (maximum mutual information) and code rate capacity (supremum achievable code rate) has been established. In this paper, we focus on a major subclass of molecular communication - the diffusion-based molecular communication. We provide solid mathematical foundations for information theory in diffusion-based molecular communication by creating a general diffusion-based molecular channel model in measure-theoretic form and prove its channel coding theorems. Various equivalence relationships between statistical and operational definitions of channel capacity are also established, including the most classic information rate capacity and code rate capacity. As byproducts, we have shown that the diffusion-based molecular channel is with "asymptotically decreasing input memory and anticipation" and "d-continuous". Other properties of diffusion-based molecular channel such as stationarity or ergodicity are also proven.

Ya-Ping Hsieh

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Support Before Frequency in Discrete Diffusion

Continuous-time Analysis for Variational Inequalities: An Overview and Desiderata

The limits of min-max optimization algorithms: convergence to spurious non-critical sets

Conditional gradient methods for stochastically constrained convex minimization

Mirrored Langevin Dynamics

An Efficient Streaming Algorithm for the Submodular Cover Problem

Frank-Wolfe Works for Non-Lipschitz Continuous Gradient Objectives: Scalable Poisson Phase Retrieval

Nanostructure analysis of InGaN/GaN quantum wells based on semi-polar-faced GaN nanorods

A Geometric View on Constrained M-Estimators

Mathematical Foundations for Information Theory in Diffusion-Based Molecular Communications