Source author record

Jae-kwang Kim

Jae-kwang Kim appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Applications Machine Learning

Catalog footprint

What is connected

5works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Hypotheses Testing from Complex Survey Data Using Bootstrap Weights: A Unified Approach

Standard statistical methods that do not take proper account of the complexity of survey design can lead to erroneous inferences when applied to survey data due to unequal selection probabilities, clustering, and other design features. In particular, the actual type I error rates of tests of hypotheses using standard methods can be much bigger than the nominal significance level. Methods that take account of survey design features in testing hypotheses have been proposed, including Wald tests and quasi-score tests that involve the estimated covariance matrices of parameter estimates. In this paper, we present a unified approach to hypothesis testing that does not require computing the covariance matrices by constructing bootstrap approximations to weighted likelihood ratio statistics and weighted quasi-score statistics and establish the asymptotic validity of the proposed bootstrap tests. In addition, we also consider hypothesis testing from categorical data and present a bootstrap procedure for testing simple goodness of fit and independence in a two-way table. In the simulation studies, the type I error rates of the proposed approach are much closer to their nominal significance level compared with the naive likelihood-ratio test and quasi-score test. An application to data from an educational survey under a logistic regression model is also presented.

preprint2021arXiv

Statistical Inference after Kernel Ridge Regression Imputation under item nonresponse

Imputation is a popular technique for handling missing data. We consider a nonparametric approach to imputation using the kernel ridge regression technique and propose consistent variance estimation. The proposed variance estimator is based on a linearization approach which employs the entropy method to estimate the density ratio. The root-n consistency of the imputation estimator is established when a Sobolev space is utilized in the kernel ridge regression imputation, which enables us to develop the proposed variance estimator. Synthetic data experiments are presented to confirm our theory.

preprint2020arXiv

Data Integration by combining big data and survey sample data for finite population inference

The statistical challenges in using big data for making valid statistical inference in the finite population have been well documented in literature. These challenges are due primarily to statistical bias arising from under-coverage in the big data source to represent the population of interest and measurement errors in the variables available in the data set. By stratifying the population into a big data stratum and a missing data stratum, we can estimate the missing data stratum by using a fully responding probability sample, and hence the population as a whole by using a data integration estimator. By expressing the data integration estimator as a regression estimator, we can handle measurement errors in the variables in big data and also in the probability sample. We also propose a fully nonparametric classification method for identifying the overlapping units and develop a bias-corrected data integration estimator under misclassification errors. Finally, we develop a two-step regression data integration estimator to deal with measurement errors in the probability sample. An advantage of the approach advocated in this paper is that we do not have to make unrealistic missing-at-random assumptions for the methods to work. The proposed method is applied to the real data example using 2015-16 Australian Agricultural Census data.

preprint2015arXiv

Statistical Matching using Fractional Imputation

Statistical matching is a technique for integrating two or more data sets when information available for matching records for individual participants across data sets is incomplete. Statistical matching can be viewed as a missing data problem where a researcher wants to perform a joint analysis of variables that are never jointly observed. A conditional independence assumption is often used to create imputed data for statistical matching. We consider an alternative approach to statistical matching without using the conditional independence assumption. We apply parametric fractional imputation of Kim (2011) to create imputed data using an instrumental variable assumption to identify the joint distribution. We also present variance estimators appropriate for the imputation procedure. We explain how the method applies directly to the analysis of data from split questionnaire designs and measurement error models.

preprint2014arXiv

Two-phase sampling experiment for propensity score estimation in self-selected samples

Self-selected samples are frequently obtained due to different levels of survey participation propensity of the survey individuals. When the survey participation is related to the survey topic of interest, propensity score weighting adjustment using auxiliary information may lead to biased estimation. In this paper, we consider a parametric model for the response probability that includes the study variable itself in the covariates of the model and proposes a novel application of two-phase sampling to estimate the parameters of the propensity model. The proposed method includes an experiment in which data are collected again from a subset of the original self-selected sample. With this two-phase sampling experiment, we can estimate the parameters in a propensity score model consistently. Then the propensity score adjustment can be applied to the self-selected sample to estimate the population parameters. Sensitivity of the selection model assumption is investigated from two limited simulation studies. The proposed method is applied to the 2012 Iowa Caucus Survey.