Researcher profile

Chieh-Hsin Lai

Chieh-Hsin Lai contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Understanding and Accelerating the Training of Masked Diffusion Language Models

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models (ARMs) for language modeling. However, MDMs are known to learn substantially more slowly than ARMs, which may become problematic when scaling MDMs to larger models. Therefore, we ask the following question: how can we accelerate standard MDM training while maintaining its final performance? To this end, we first provide a detailed analysis of why MDM training is slow. We find that the main factor is the locality bias of language: the predictive information for a token is concentrated in nearby positions. We further investigate how this bias slows learning and suggest a simple yet effective remedy: bell-shaped time sampling as a training strategy. Notably, MDMs trained with our training recipe reach the same validation negative log-likelihood (NLL) up to $\sim4\times$ faster than standard training on One Billion Word Benchmark (LM1B). We also show faster improvements in generative perplexity, zero-shot perplexity, and downstream task performance on various benchmarks.

preprint2022arXiv

Preventing Oversmoothing in VAE via Generalized Variance Parameterization

Variational autoencoders (VAEs) often suffer from posterior collapse, which is a phenomenon in which the learned latent space becomes uninformative. This is often related to the hyperparameter resembling the data variance. It can be shown that an inappropriate choice of this hyperparameter causes the oversmoothness in the linearly approximated case and can be empirically verified for the general cases. Moreover, determining such appropriate choice becomes infeasible if the data variance is non-uniform or conditional. Therefore, we propose VAE extensions with generalized parameterizations of the data variance and incorporate maximum likelihood estimation into the objective function to adaptively regularize the decoder smoothness. The images generated from proposed VAE extensions show improved Fréchet inception distance (FID) on MNIST and CelebA datasets.

preprint2022arXiv

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.

preprint2020arXiv

Inverse Problems, Deep Learning, and Symmetry Breaking

In many physical systems, inputs related by intrinsic system symmetries are mapped to the same output. When inverting such systems, i.e., solving the associated inverse problems, there is no unique solution. This causes fundamental difficulties for deploying the emerging end-to-end deep learning approach. Using the generalized phase retrieval problem as an illustrative example, we show that careful symmetry breaking on the training data can help get rid of the difficulties and significantly improve the learning performance. We also extract and highlight the underlying mathematical principle of the proposed solution, which is directly applicable to other inverse problems.

preprint2019arXiv

Robust Subspace Recovery Layer for Unsupervised Anomaly Detection

We propose a neural network for unsupervised anomaly detection with a novel robust subspace recovery layer (RSR layer). This layer seeks to extract the underlying subspace from a latent representation of the given data and removes outliers that lie away from this subspace. It is used within an autoencoder. The encoder maps the data into a latent space, from which the RSR layer extracts the subspace. The decoder then smoothly maps back the underlying subspace to a "manifold" close to the original inliers. Inliers and outliers are distinguished according to the distances between the original and mapped positions (small for inliers and large for outliers). Extensive numerical experiments with both image and document datasets demonstrate state-of-the-art precision and recall.