Researcher profile

Johan Lim

Johan Lim contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2025arXiv

$\ell_0$-Regularized Item Response Theory Model for Robust Ideal Point Estimation

Ideal point estimation methods face a significant challenge when legislators engage in protest voting -- strategically voting against their party to express dissatisfaction. Such votes introduce attenuation bias, making ideologically extreme legislators appear artificially moderate. We propose a novel statistical framework that extends the fast EM-based estimation approach of \cite{Imai2016} using $\ell_0$ regularization method to handle protest votes. Through simulation studies, we demonstrate that our proposed method maintains estimation accuracy even with high proportions of protest votes, while being substantially faster than MCMC-based methods. Applying our method to the 116th and 117th U.S. House of Representatives, we successfully recover the extreme liberal positions of ``the Squad'', whose protest votes had caused conventional methods to misclassify them as moderates. While conventional methods rank Ocasio-Cortez as more conservative than 69\% of Democrats, our method places her firmly in the progressive wing, aligning with her documented policy positions. This approach provides both robust ideal point estimates and systematic identification of protest votes, facilitating deeper analysis of strategic voting behavior in legislatures.

preprint2025arXiv

Empirical Bayes Method for Large Scale Multiple Testing with Heteroscedastic Errors

In this paper, we address the normal mean inference problem, which involves testing multiple means of normal random variables with heteroscedastic variances. Most existing empirical Bayes methods for this setting are developed under restrictive assumptions, such as the scaled inverse-chi-squared prior for variances and unimodality for the non-null mean distribution. However, when either of these assumptions is violated, these methods often fail to control the false discovery rate (FDR) at the target level or suffer from a substantial loss of power. To overcome these limitations, we propose a new empirical Bayes method, gg-Mix, which assumes only independence between the normal means and variances, without imposing any structural restrictions on their distributions. We thoroughly evaluate the FDR control and power of gg-Mix through extensive numerical studies and demonstrate its superior performance compared to existing methods. Finally, we apply gg-Mix to three real data examples to further illustrate the practical advantages of our approach.

preprint2022arXiv

Empirical Likelihood Inference for Area under the ROC Curve using Ranked Set Samples

The area under a receiver operating characteristic curve (AUC) is a useful tool to assess the performance of continuous-scale diagnostic tests on binary classification. In this article, we propose an empirical likelihood (EL) method to construct confidence intervals for the AUC from data collected by ranked set sampling (RSS). The proposed EL-based method enables inferences without assumptions required in existing nonparametric methods and takes advantage of the sampling efficiency of RSS. We show that for both balanced and unbalanced RSS, the EL-based point estimate is the Mann-Whitney statistic, and confidence intervals can be obtained from a scaled chi-square distribution. Simulation studies and two case studies on diabetes and chronic kidney disease data suggest that using the proposed method and RSS enables more efficient inference on the AUC.

preprint2022arXiv

Testing Independence of Bivariate Censored Data using Random Walk on Restricted Permutation Graph

In this paper, we propose a procedure to test the independence of bivariate censored data, which is generic and applicable to any censoring types in the literature. To test the hypothesis, we consider a rank-based statistic, Kendall's tau statistic. The censored data defines a restricted permutation space of all possible ranks of the observations. We propose the statistic, the average of Kendall's tau over the ranks in the restricted permutation space. To evaluate the statistic and its reference distribution, we develop a Markov chain Monte Carlo (MCMC) procedure to obtain uniform samples on the restricted permutation space and numerically approximate the null distribution of the averaged Kendall's tau. We apply the procedure to three real data examples with different censoring types, and compare the results with those by existing methods. We conclude the paper with some additional discussions not given in the main body of the paper.

preprint2020arXiv

Covariate-dependent control limits for the detection of abnormal price changes in scanner data

Currently, large-scale sales data for consumer goods, called scanner data, are obtained by scanning the bar codes of individual products at the points of sale of retail outlets. Many national statistical offices use scanner data to build consumer price statistics. In this process, as in other statistical procedures, the detection of abnormal transactions in sales prices is an important step in the analysis. Popular methods for conducting such outlier detection are the quartile method, the Hidiroglou-Berthelot method, the resistant fences method, and the Tukey algorithm. These methods are based solely on information about price changes and not on any of the other covariates (e.g., sales volume or types of retail shops) that are also available from scanner data. In this paper, we propose a new method to detect abnormal price changes that takes into account an additional covariate, namely, sales volume. We assume that the variance of the log of the price change is a smooth function of the sales volume and estimate the function from previously observed data. We numerically show the advantages of the new method over existing methods. We also apply the methods to real scanner data collected at weekly intervals by the Korean Chamber of Commerce and Industry between 2013 and 2014 and compare their performance.