Source author record

Johan Lim

Johan Lim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Computation Machine Learning

Catalog footprint

What is connected

9works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

$\ell_0$-Regularized Item Response Theory Model for Robust Ideal Point Estimation

Ideal point estimation methods face a significant challenge when legislators engage in protest voting -- strategically voting against their party to express dissatisfaction. Such votes introduce attenuation bias, making ideologically extreme legislators appear artificially moderate. We propose a novel statistical framework that extends the fast EM-based estimation approach of \cite{Imai2016} using $\ell_0$ regularization method to handle protest votes. Through simulation studies, we demonstrate that our proposed method maintains estimation accuracy even with high proportions of protest votes, while being substantially faster than MCMC-based methods. Applying our method to the 116th and 117th U.S. House of Representatives, we successfully recover the extreme liberal positions of ``the Squad'', whose protest votes had caused conventional methods to misclassify them as moderates. While conventional methods rank Ocasio-Cortez as more conservative than 69\% of Democrats, our method places her firmly in the progressive wing, aligning with her documented policy positions. This approach provides both robust ideal point estimates and systematic identification of protest votes, facilitating deeper analysis of strategic voting behavior in legislatures.

preprint2025arXiv

Empirical Bayes Method for Large Scale Multiple Testing with Heteroscedastic Errors

In this paper, we address the normal mean inference problem, which involves testing multiple means of normal random variables with heteroscedastic variances. Most existing empirical Bayes methods for this setting are developed under restrictive assumptions, such as the scaled inverse-chi-squared prior for variances and unimodality for the non-null mean distribution. However, when either of these assumptions is violated, these methods often fail to control the false discovery rate (FDR) at the target level or suffer from a substantial loss of power. To overcome these limitations, we propose a new empirical Bayes method, gg-Mix, which assumes only independence between the normal means and variances, without imposing any structural restrictions on their distributions. We thoroughly evaluate the FDR control and power of gg-Mix through extensive numerical studies and demonstrate its superior performance compared to existing methods. Finally, we apply gg-Mix to three real data examples to further illustrate the practical advantages of our approach.

preprint2022arXiv

Empirical Likelihood Inference for Area under the ROC Curve using Ranked Set Samples

The area under a receiver operating characteristic curve (AUC) is a useful tool to assess the performance of continuous-scale diagnostic tests on binary classification. In this article, we propose an empirical likelihood (EL) method to construct confidence intervals for the AUC from data collected by ranked set sampling (RSS). The proposed EL-based method enables inferences without assumptions required in existing nonparametric methods and takes advantage of the sampling efficiency of RSS. We show that for both balanced and unbalanced RSS, the EL-based point estimate is the Mann-Whitney statistic, and confidence intervals can be obtained from a scaled chi-square distribution. Simulation studies and two case studies on diabetes and chronic kidney disease data suggest that using the proposed method and RSS enables more efficient inference on the AUC.

preprint2022arXiv

Testing Independence of Bivariate Censored Data using Random Walk on Restricted Permutation Graph

In this paper, we propose a procedure to test the independence of bivariate censored data, which is generic and applicable to any censoring types in the literature. To test the hypothesis, we consider a rank-based statistic, Kendall's tau statistic. The censored data defines a restricted permutation space of all possible ranks of the observations. We propose the statistic, the average of Kendall's tau over the ranks in the restricted permutation space. To evaluate the statistic and its reference distribution, we develop a Markov chain Monte Carlo (MCMC) procedure to obtain uniform samples on the restricted permutation space and numerically approximate the null distribution of the averaged Kendall's tau. We apply the procedure to three real data examples with different censoring types, and compare the results with those by existing methods. We conclude the paper with some additional discussions not given in the main body of the paper.

preprint2020arXiv

Covariate-dependent control limits for the detection of abnormal price changes in scanner data

Currently, large-scale sales data for consumer goods, called scanner data, are obtained by scanning the bar codes of individual products at the points of sale of retail outlets. Many national statistical offices use scanner data to build consumer price statistics. In this process, as in other statistical procedures, the detection of abnormal transactions in sales prices is an important step in the analysis. Popular methods for conducting such outlier detection are the quartile method, the Hidiroglou-Berthelot method, the resistant fences method, and the Tukey algorithm. These methods are based solely on information about price changes and not on any of the other covariates (e.g., sales volume or types of retail shops) that are also available from scanner data. In this paper, we propose a new method to detect abnormal price changes that takes into account an additional covariate, namely, sales volume. We assume that the variance of the log of the price change is a smooth function of the sales volume and estimate the function from previously observed data. We numerically show the advantages of the new method over existing methods. We also apply the methods to real scanner data collected at weekly intervals by the Korean Chamber of Commerce and Industry between 2013 and 2014 and compare their performance.

preprint2016arXiv

Empirical Null Estimation using Discrete Mixture Distributions and its Application to Protein Domain Data

In recent mutation studies, analyses based on protein domain positions are gaining popularity over gene-centric approaches since the latter have limitations in considering the functional context that the position of the mutation provides. This presents a large-scale simultaneous inference problem, with hundreds of hypothesis tests to consider at the same time. This paper aims to select significant mutation counts while controlling a given level of Type I error via False Discovery Rate (FDR) procedures. One main assumption is that there exists a cut-off value such that smaller counts than this value are generated from the null distribution. We present several data-dependent methods to determine the cut-off value. We also consider a two-stage procedure based on screening process so that the number of mutations exceeding a certain value should be considered as significant mutations. Simulated and protein domain data sets are used to illustrate this procedure in estimation of the empirical null using a mixture of discrete distributions.

preprint2013arXiv

High-dimensional Fused Lasso Regression using Majorization-Minimization and Parallel Processing

In this paper, we propose a majorization-minimization (MM) algorithm for high-dimensional fused lasso regression (FLR) suitable for parallelization using graphics processing units (GPUs). The MM algorithm is stable and flexible as it can solve the FLR problems with various types of design matrices and penalty structures within a few tens of iterations. We also show that the convergence of the proposed algorithm is guaranteed. We conduct numerical studies to compare our algorithm with other existing algorithms, demonstrating that the proposed MM algorithm is competitive in many settings including the two-dimensional FLR with arbitrary design matrices. The merit of GPU parallelization is also exhibited.

preprint2013arXiv

Monotone false discovery rate

This paper proposes a procedure to obtain monotone estimates of both the local and the tail false discovery rates that arise in large-scale multiple testing. The proposed monotonization is asymptotically optimal for controlling the false discovery rate and also has many attractive finite-sample properties.

preprint2013arXiv

Regression shrinkage and grouping of highly correlated predictors with HORSES

Identifying homogeneous subgroups of variables can be challenging in high dimensional data analysis with highly correlated predictors. We propose a new method called Hexagonal Operator for Regression with Shrinkage and Equality Selection, HORSES for short, that simultaneously selects positively correlated variables and identifies them as predictive clusters. This is achieved via a constrained least-squares problem with regularization that consists of a linear combination of an L_1 penalty for the coefficients and another L_1 penalty for pairwise differences of the coefficients. This specification of the penalty function encourages grouping of positively correlated predictors combined with a sparsity solution. We construct an efficient algorithm to implement the HORSES procedure. We show via simulation that the proposed method outperforms other variable selection methods in terms of prediction error and parsimony. The technique is demonstrated on two data sets, a small data set from analysis of soil in Appalachia, and a high dimensional data set from a near infrared (NIR) spectroscopy study, showing the flexibility of the methodology.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Methodology Applications Computation Machine Learning

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2512.24611:author:2:johan-lim

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2512.24642:author:2:johan-lim

Imported May 21, 2026Synced May 21, 2026

4 works

Donghyeon Yu

Researcher

Donghyeon Yu contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Dohwan Park

Researcher

Dohwan Park contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Joong-Ho Won

Researcher

Joong-Ho Won contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Kwangok Seo

Researcher

Kwangok Seo contributes to research discovery and scholarly infrastructure.

Open to collaborate

Johan Lim

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

$\ell_0$-Regularized Item Response Theory Model for Robust Ideal Point Estimation

Empirical Bayes Method for Large Scale Multiple Testing with Heteroscedastic Errors

Empirical Likelihood Inference for Area under the ROC Curve using Ranked Set Samples

Testing Independence of Bivariate Censored Data using Random Walk on Restricted Permutation Graph

Covariate-dependent control limits for the detection of abnormal price changes in scanner data

Empirical Null Estimation using Discrete Mixture Distributions and its Application to Protein Domain Data

High-dimensional Fused Lasso Regression using Majorization-Minimization and Parallel Processing

Monotone false discovery rate

Regression shrinkage and grouping of highly correlated predictors with HORSES