Source author record

Christopher J. Shallue

Christopher J. Shallue appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning astro-ph.EP astro-ph.IM astro-ph.SR math.CO math.NT

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Identifying Exoplanets with Deep Learning. IV. Removing Stellar Activity Signals from Radial Velocity Measurements Using Neural Networks

Exoplanet detection with precise radial velocity (RV) observations is currently limited by spurious RV signals introduced by stellar activity. We show that machine learning techniques such as linear regression and neural networks can effectively remove the activity signals (due to starspots/faculae) from RV observations. Previous efforts focused on carefully filtering out activity signals in time using modeling techniques like Gaussian Process regression (e.g. Haywood et al. 2014). Instead, we systematically remove activity signals using only changes to the average shape of spectral lines, and no information about when the observations were collected. We trained our machine learning models on both simulated data (generated with the SOAP 2.0 software; Dumusque et al. 2014) and observations of the Sun from the HARPS-N Solar Telescope (Dumusque et al. 2015; Phillips et al. 2016; Collier Cameron et al. 2019). We find that these techniques can predict and remove stellar activity from both simulated data (improving RV scatter from 82 cm/s to 3 cm/s) and from more than 600 real observations taken nearly daily over three years with the HARPS-N Solar Telescope (improving the RV scatter from 1.753 m/s to 1.039 m/s, a factor of ~ 1.7 improvement). In the future, these or similar techniques could remove activity signals from observations of stars outside our solar system and eventually help detect habitable-zone Earth-mass exoplanets around Sun-like stars.

preprint2022arXiv

The EXPRES Stellar Signals Project II. State of the Field in Disentangling Photospheric Velocities

Measured spectral shifts due to intrinsic stellar variability (e.g., pulsations, granulation) and activity (e.g., spots, plages) are the largest source of error for extreme precision radial velocity (EPRV) exoplanet detection. Several methods are designed to disentangle stellar signals from true center-of-mass shifts due to planets. The EXPRES Stellar Signals Project (ESSP) presents a self-consistent comparison of 22 different methods tested on the same extreme-precision spectroscopic data from EXPRES. Methods derived new activity indicators, constructed models for mapping an indicator to the needed RV correction, or separated out shape- and shift-driven RV components. Since no ground truth is known when using real data, relative method performance is assessed using the total and nightly scatter of returned RVs and agreement between the results of different methods. Nearly all submitted methods return a lower RV RMS than classic linear decorrelation, but no method is yet consistently reducing the RV RMS to sub-meter-per-second levels. There is a concerning lack of agreement between the RVs returned by different methods. These results suggest that continued progress in this field necessitates increased interpretability of methods, high-cadence data to capture stellar signals at all timescales, and continued tests like the ESSP using consistent data sets with more advanced metrics for method performance. Future comparisons should make use of various well-characterized data sets -- such as solar data or data with known injected planetary and/or stellar signals -- to better understand method performance and whether planetary signals are preserved.

preprint2020arXiv

Faster Neural Network Training with Data Echoing

In the twilight of Moore's law, GPUs and other specialized hardware accelerators have dramatically sped up neural network training. However, earlier stages of the training pipeline, such as disk I/O and data preprocessing, do not run on accelerators. As accelerators continue to improve, these earlier stages will increasingly become the bottleneck. In this paper, we introduce "data echoing," which reduces the total computation used by earlier pipeline stages and speeds up training whenever computation upstream from accelerators dominates the training time. Data echoing reuses (or "echoes") intermediate outputs from earlier pipeline stages in order to reclaim idle capacity. We investigate the behavior of different data echoing algorithms on various workloads, for various amounts of echoing, and for various batch sizes. We find that in all settings, at least one data echoing algorithm can match the baseline's predictive performance using less upstream computation. We measured a factor of 3.25 decrease in wall-clock time for ResNet-50 on ImageNet when reading training data over a network.

preprint2020arXiv

On Empirical Comparisons of Optimizers for Deep Learning

Selecting an optimizer is a central step in the contemporary deep learning pipeline. In this paper, we demonstrate the sensitivity of optimizer comparisons to the hyperparameter tuning protocol. Our findings suggest that the hyperparameter search space may be the single most important factor explaining the rankings obtained by recent empirical comparisons in the literature. In fact, we show that these results can be contradicted when hyperparameter search spaces are changed. As tuning effort grows without bound, more general optimizers should never underperform the ones they can approximate (i.e., Adam should never perform worse than momentum), but recent attempts to compare optimizers either assume these inclusion relationships are not practically relevant or restrict the hyperparameters in ways that break the inclusions. In our experiments, we find that inclusion relationships between optimizers matter in practice and always predict optimizer comparisons. In particular, we find that the popular adaptive gradient methods never underperform momentum or gradient descent. We also report practical tips around tuning often ignored hyperparameters of adaptive gradient methods and raise concerns about fairly benchmarking optimizers for neural network training.

preprint2012arXiv

Permutation polynomials of finite fields

Let $\mathbb{F}_q$ be the finite field of $q$ elements. Then a \emph{permutation polynomial} (PP) of $\mathbb{F}_q$ is a polynomial $f \in \mathbb{F}_q[x]$ such that the associated function $c \mapsto f(c)$ is a permutation of the elements of $\mathbb{F}_q$. In 1897 Dickson gave what he claimed to be a complete list of PPs of degree at most 6, however there have been suggestions recently that this classification might be incomplete. Unfortunately, Dickson's claim of a full characterisation is not easily verified because his published proof is difficult to follow. This is mainly due to antiquated terminology. In this project we present a full reconstruction of the classification of degree 6 PPs, which combined with a recent paper by Li \emph{et al.} finally puts to rest the characterisation problem of PPs of degree up to 6. In addition, we give a survey of the major results on PPs since Dickson's 1897 paper. Particular emphasis is placed on the proof of the so-called \emph{Carlitz Conjecture}, which states that if $q$ is odd and `large' and $n$ is even then there are no PPs of degree $n$. This important result was resolved in the affirmative by research spanning three decades. A generalisation of Carlitz's conjecture due to Mullen proposes that if $q$ is odd and `large' and $n$ is even then no polynomial of degree $n$ is `close' to being a PP. This has remained an unresolved problem in published literature. We provide a counterexample to Mullen's conjecture, and also point out how recent results imply a more general version of this statement (provided one increases what is meant by $q$ being `large').