Source author record

Satoshi Hayakawa

Satoshi Hayakawa appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.PR Artificial Intelligence Computation and Language math.NA math.ST Statistics Theory

Catalog footprint

What is connected

3works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Convex-Geometric Error Bounds for Positive-Weight Kernel Quadrature

Kernel quadrature can exploit RKHS spectral structure and outperform Monte Carlo on smooth integrands, but optimized quadrature weights are generally signed and may be numerically unstable. We study whether spectral acceleration remains possible when the weights are constrained to be positive, i.e., simplex weights. In the exact-target fixed-pool setting, an evaluated i.i.d. candidate pool of size $N$ is already available and the task is to reweight it so as to approximate the kernel mean embedding. We show that this positive reweighting problem is governed not by the equal-weight empirical average, but by the random convex hull generated by the pool. Our main geometric result shows that the mean of a bounded $d$-dimensional random vector can be approximated by a convex combination of $N$ i.i.d. samples at accuracy $O(d/N)$ with high probability, sharper than equal-weight averaging in the fixed-dimensional regime. We transfer this $d$-dimensional convex-hull approximation to full RKHS worst-case error through an augmented Mercer-truncation argument. The resulting positive-weight KQ bounds consist of a spectral tail term and a finite-sample convex-hull term, yielding Monte-Carlo-beating rates in favorable spectral regimes, including near-$O(1/N)$ rates up to logarithmic factors under exponential spectral decay. We also provide a constructive Frank--Wolfe algorithm that operates directly on the pool atoms, maintains simplex weights, and admits an explicit optimization-error bound.

preprint2026arXiv

Understanding and Accelerating the Training of Masked Diffusion Language Models

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models (ARMs) for language modeling. However, MDMs are known to learn substantially more slowly than ARMs, which may become problematic when scaling MDMs to larger models. Therefore, we ask the following question: how can we accelerate standard MDM training while maintaining its final performance? To this end, we first provide a detailed analysis of why MDM training is slow. We find that the main factor is the locality bias of language: the predictive information for a token is concentrated in nearby positions. We further investigate how this bias slows learning and suggest a simple yet effective remedy: bell-shaped time sampling as a training strategy. Notably, MDMs trained with our training recipe reach the same validation negative log-likelihood (NLL) up to $\sim4\times$ faster than standard training on One Billion Word Benchmark (LM1B). We also show faster improvements in generative perplexity, zero-shot perplexity, and downstream task performance on various benchmarks.

preprint2021arXiv

Estimating the probability that a given vector is in the convex hull of a random sample

For a $d$-dimensional random vector $X$, let $p_{n, X}(θ)$ be the probability that the convex hull of $n$ independent copies of $X$ contains a given point $θ$. We provide several sharp inequalities regarding $p_{n, X}(θ)$ and $N_X(θ)$ denoting the smallest $n$ for which $p_{n, X}(θ)\ge1/2$. As a main result, we derive the totally general inequality $1/2 \le α_X(θ)N_X(θ)\le 3d + 1$, where $α_X(θ)$ (a.k.a. the Tukey depth) is the minimum probability that $X$ is in a fixed closed halfspace containing the point $θ$. We also show several applications of our general results: one is a moment-based bound on $N_X(\mathbb{E}[X])$, which is an important quantity in randomized approaches to cubature construction or measure reduction problem. Another application is the determination of the canonical convex body included in a random convex polytope given by independent copies of $X$, where our combinatorial approach allows us to generalize existing results in random matrix community significantly.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint