Source author record

John Rice

John Rice appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications astro-ph.IM math.HO stat.OT

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2012arXiv

A Brief History of the Statistics Department of the University of California at Berkeley

The early history of our department was dominated by Jerzy Neyman (1894-1981), while the next phase was largely in the hands of Neyman's students, with Erich Lehmann (1917-2009) being a central, long-lived and much-loved member of this group. We are very fortunate in having Constance Reid's biography "Neyman -- From Life" and Erich's "Reminiscences of a Statistician: The Company I Kept" and other historical material documenting the founding and growth of the department, and the people in it. In what follows, we will draw heavily from these sources, describing what seems to us to be a remarkable success story: one person starting "a cell of statistical research and teaching ... not being hampered by any existing traditions and routines" and seeing that cell grow rapidly into a major force in academic statistics worldwide. That it has remained so for (at least) the half-century after its founding is a testament to the strength of Neyman's model for a department of statistics.

preprint2012arXiv

Using Machine Learning for Discovery in Synoptic Survey Imaging

Modern time-domain surveys continuously monitor large swaths of the sky to look for astronomical variability. Astrophysical discovery in such data sets is complicated by the fact that detections of real transient and variable sources are highly outnumbered by bogus detections caused by imperfect subtractions, atmospheric effects and detector artefacts. In this work we present a machine learning (ML) framework for discovery of variability in time-domain imaging surveys. Our ML methods provide probabilistic statements, in near real time, about the degree to which each newly observed source is astrophysically relevant source of variable brightness. We provide details about each of the analysis steps involved, including compilation of the training and testing sets, construction of descriptive image-based and contextual features, and optimization of the feature subset and model tuning parameters. Using a validation set of nearly 30,000 objects from the Palomar Transient Factory, we demonstrate a missed detection rate of at most 7.7% at our chosen false-positive rate of 1% for an optimized ML classifier of 23 features, selected to avoid feature correlation and over-fitting from an initial library of 42 attributes. Importantly, we show that our classification methodology is insensitive to mis-labelled training data up to a contamination of nearly 10%, making it easier to compile sufficient training sets for accurate performance in future surveys. This ML framework, if so adopted, should enable the maximization of scientific gain from future synoptic survey and enable fast follow-up decisions on the vast amounts of streaming data produced by such experiments.

preprint2011arXiv

Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification

Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up---is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and OGLE, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply active learning to classify variable stars in the ASAS survey, finding dramatic improvement in our agreement with the ACVS catalog, from 65.5% to 79.5%, and a significant increase in the classifier's average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.