Source author record

Hyungsuk Tak

Hyungsuk Tak appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.IM Methodology astro-ph.CO Computation

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Incorporating Measurement Error in Astronomical Object Classification

Most general-purpose classification methods, such as support-vector machine (SVM) and random forest (RF), fail to account for an unusual characteristic of astronomical data: known measurement error uncertainties. In astronomical data, this information is often given in the data but discarded because popular machine learning classifiers cannot incorporate it. We propose a simulation-based approach that incorporates heteroscedastic measurement error into existing classification method to better quantify uncertainty in classification. The proposed method first simulates perturbed realizations of the data from a Bayesian posterior predictive distribution of a Gaussian measurement error model. Then, a chosen classifier is fit to each simulation. The variation across the simulations naturally reflects the uncertainty propagated from the measurement errors in both labeled and unlabeled data sets. We demonstrate the use of this approach via two numerical studies. The first is a thorough simulation study applying the proposed procedure to SVM and RF, which are well-known hard and soft classifiers, respectively. The second study is a realistic classification problem of identifying high-$z$ $(2.9 \leq z \leq 5.1)$ quasar candidates from photometric data. The data are from merged catalogs of the Sloan Digital Sky Survey, the $Spitzer$ IRAC Equatorial Survey, and the $Spitzer$-HETDEX Exploratory Large-Area Survey. The proposed approach reveals that out of 11,847 high-$z$ quasar candidates identified by a random forest without incorporating measurement error, 3,146 are potential misclassifications with measurement error. Additionally, out of $1.85$ million objects not identified as high-$z$ quasars without measurement error, 936 can be considered new candidates with measurement error.

preprint2020arXiv

Data transforming augmentation for heteroscedastic models

Data augmentation (DA) turns seemingly intractable computational problems into simple ones by augmenting latent missing data. In addition to computational simplicity, it is now well-established that DA equipped with a deterministic transformation can improve the convergence speed of iterative algorithms such as an EM algorithm or Gibbs sampler. In this article, we outline a framework for the transformation-based DA, which we call data transforming augmentation (DTA), allowing augmented data to be a deterministic function of latent and observed data, and unknown parameters. Under this framework, we investigate a novel DTA scheme that turns heteroscedastic models into homoscedastic ones to take advantage of simpler computations typically available in homoscedastic cases. Applying this DTA scheme to fitting linear mixed models, we demonstrate simpler computations and faster convergence rates of resulting iterative algorithms, compared with those under a non-transformation-based DA scheme. We also fit a Beta-Binomial model using the proposed DTA scheme, which enables sampling approximate marginal posterior distributions that are available only under homoscedasticity. An R package, Rdta, is publicly available at CRAN.

preprint2014arXiv

Strong Lens Time Delay Challenge: II. Results of TDC1

We present the results of the first strong lens time delay challenge. The motivation, experimental design, and entry level challenge are described in a companion paper. This paper presents the main challenge, TDC1, which consisted of analyzing thousands of simulated light curves blindly. The observational properties of the light curves cover the range in quality obtained for current targeted efforts (e.g.,~COSMOGRAIL) and expected from future synoptic surveys (e.g.,~LSST), and include simulated systematic errors. \nteamsA\ teams participated in TDC1, submitting results from \nmethods\ different method variants. After a describing each method, we compute and analyze basic statistics measuring accuracy (or bias) $A$, goodness of fit $χ^2$, precision $P$, and success rate $f$. For some methods we identify outliers as an important issue. Other methods show that outliers can be controlled via visual inspection or conservative quality control. Several methods are competitive, i.e., give $|A|<0.03$, $P<0.03$, and $χ^2<1.5$, with some of the methods already reaching sub-percent accuracy. The fraction of light curves yielding a time delay measurement is typically in the range $f = $20--40\%. It depends strongly on the quality of the data: COSMOGRAIL-quality cadence and light curve lengths yield significantly higher $f$ than does sparser sampling. Taking the results of TDC1 at face value, we estimate that LSST should provide around 400 robust time-delay measurements, each with $P<0.03$ and $|A|<0.01$, comparable to current lens modeling uncertainties. In terms of observing strategies, we find that $A$ and $f$ depend mostly on season length, while P depends mostly on cadence and campaign duration.