Source author record

Andrew Connolly

Andrew Connolly appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph astro-ph.CO astro-ph.IM Databases Distributed, Parallel, and Cluster Computing Machine Learning physics.ed-ph physics.pop-ph

Catalog footprint

What is connected

16works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

The Astronomy Commons Platform: A Deployable Cloud-Based Analysis Platform for Astronomy

We present a scalable, cloud-based science platform solution designed to enable next-to-the-data analyses of terabyte-scale astronomical tabular datasets. The presented platform is built on Amazon Web Services (over Kubernetes and S3 abstraction layers), utilizes Apache Spark and the Astronomy eXtensions for Spark for parallel data analysis and manipulation, and provides the familiar JupyterHub web-accessible front-end for user access. We outline the architecture of the analysis platform, provide implementation details, rationale for (and against) technology choices, verify scalability through strong and weak scaling tests, and demonstrate usability through an example science analysis of data from the Zwicky Transient Facility's 1Bn+ light-curve catalog. Furthermore, we show how this system enables an end-user to iteratively build analyses (in Python) that transparently scale processing with no need for end-user interaction. The system is designed to be deployable by astronomers with moderate cloud engineering knowledge, or (ideally) IT groups. Over the past three years, it has been utilized to build science platforms for the DiRAC Institute, the ZTF partnership, the LSST Solar System Science Collaboration, the LSST Interdisciplinary Network for Collaboration and Computing, as well as for numerous short-term events (with over 100 simultaneous users). A live demo instance, the deployment scripts, source code, and cost calculators are accessible at http://hub.astronomycommons.org/.

preprint2020arXiv

Sampling for Deep Learning Model Diagnosis (Technical Report)

Deep learning (DL) models have achieved paradigm-changing performance in many fields with high dimensional data, such as images, audio, and text. However, the black-box nature of deep neural networks is a barrier not just to adoption in applications such as medical diagnosis, where interpretability is essential, but also impedes diagnosis of under performing models. The task of diagnosing or explaining DL models requires the computation of additional artifacts, such as activation values and gradients. These artifacts are large in volume, and their computation, storage, and querying raise significant data management challenges. In this paper, we articulate DL diagnosis as a data management problem, and we propose a general, yet representative, set of queries to evaluate systems that strive to support this new workload. We further develop a novel data sampling technique that produce approximate but accurate results for these model debugging queries. Our sampling technique utilizes the lower dimension representation learned by the DL model and focuses on model decision boundaries for the data in this lower dimensional space. We evaluate our techniques on one standard computer vision and one scientific data set and demonstrate that our sampling technique outperforms a variety of state-of-the-art alternatives in terms of query accuracy.

preprint2015arXiv

Efficient Iterative Processing in the SciDB Parallel Array Engine

Many scientific data-intensive applications perform iterative computations on array data. There exist multiple engines specialized for array processing. These engines efficiently support various types of operations, but none includes native support for iterative processing. In this paper, we develop a model for iterative array computations and a series of optimizations. We evaluate the benefits of an optimized, native support for iterative array processing on the SciDB engine and real workloads from the astronomy domain.

preprint2013arXiv

Growth of Cosmic Structure: Probing Dark Energy Beyond Expansion

The quantity and quality of cosmic structure observations have greatly accelerated in recent years. Further leaps forward will be facilitated by imminent projects, which will enable us to map the evolution of dark and baryonic matter density fluctuations over cosmic history. The way that these fluctuations vary over space and time is sensitive to the nature of dark matter and dark energy. Dark energy and gravity both affect how rapidly structure grows; the greater the acceleration, the more suppressed the growth of structure, while the greater the gravity, the more enhanced the growth. While distance measurements also constrain dark energy, the comparison of growth and distance data tests whether General Relativity describes the laws of physics accurately on large scales. Modified gravity models are able to reproduce the distance measurements but at the cost of altering the growth of structure (these signatures are described in more detail in the accompanying paper on Novel Probes of Gravity and Dark Energy). Upcoming surveys will exploit these differences to determine whether the acceleration of the Universe is due to dark energy or to modified gravity. To realize this potential, both wide field imaging and spectroscopic redshift surveys play crucial roles. Projects including DES, eBOSS, DESI, PFS, LSST, Euclid, and WFIRST are in line to map more than a 1000 cubic-billion-light-year volume of the Universe. These will map the cosmic structure growth rate to 1% in the redshift range 0<z<2, over the last 3/4 of the age of the Universe.

preprint2013arXiv

Snowmass Computing Frontier: Computing for the Cosmic Frontier, Astrophysics, and Cosmology

This document presents (off-line) computing requrements and challenges for Cosmic Frontier science, covering the areas of data management, analysis, and simulations. We invite contributions to extend the range of covered topics and to enhance the current descriptions.

preprint2013arXiv

The Multi-Object, Fiber-Fed Spectrographs for SDSS and the Baryon Oscillation Spectroscopic Survey

We present the design and performance of the multi-object fiber spectrographs for the Sloan Digital Sky Survey (SDSS) and their upgrade for the Baryon Oscillation Spectroscopic Survey (BOSS). Originally commissioned in Fall 1999 on the 2.5-m aperture Sloan Telescope at Apache Point Observatory, the spectrographs produced more than 1.5 million spectra for the SDSS and SDSS-II surveys, enabling a wide variety of Galactic and extra-galactic science including the first observation of baryon acoustic oscillations in 2005. The spectrographs were upgraded in 2009 and are currently in use for BOSS, the flagship survey of the third-generation SDSS-III project. BOSS will measure redshifts of 1.35 million massive galaxies to redshift 0.7 and Lyman-alpha absorption of 160,000 high redshift quasars over 10,000 square degrees of sky, making percent level measurements of the absolute cosmic distance scale of the Universe and placing tight constraints on the equation of state of dark energy. The twin multi-object fiber spectrographs utilize a simple optical layout with reflective collimators, gratings, all-refractive cameras, and state-of-the-art CCD detectors to produce hundreds of spectra simultaneously in two channels over a bandpass covering the near ultraviolet to the near infrared, with a resolving power R = λ/FWHM ~ 2000. Building on proven heritage, the spectrographs were upgraded for BOSS with volume-phase holographic gratings and modern CCD detectors, improving the peak throughput by nearly a factor of two, extending the bandpass to cover 360 < λ< 1000 nm, and increasing the number of fibers from 640 to 1000 per exposure. In this paper we describe the original SDSS spectrograph design and the upgrades implemented for BOSS, and document the predicted and measured performances.

preprint2011arXiv

Interpolating Masked Weak Lensing Signal with Karhunen-Loeve Analysis

We explore the utility of Karhunen Loeve (KL) analysis in solving practical problems in the analysis of gravitational shear surveys. Shear catalogs from large-field weak lensing surveys will be subject to many systematic limitations, notably incomplete coverage and pixel-level masking due to foreground sources. We develop a method to use two dimensional KL eigenmodes of shear to interpolate noisy shear measurements across masked regions. We explore the results of this method with simulated shear catalogs, using statistics of high-convergence regions in the resulting map. We find that the KL procedure not only minimizes the bias due to masked regions in the field, it also reduces spurious peak counts from shape noise by a factor of ~ 3 in the cosmologically sensitive regime. This indicates that KL reconstructions of masked shear are not only useful for creating robust convergence maps from masked shear catalogs, but also offer promise of improved parameter constraints within studies of shear peak statistics.

preprint2011arXiv

Spectroscopic Determination of the Low Redshift Type Ia Supernova Rate from the Sloan Digital Sky Survey

Supernova rates are directly coupled to high mass stellar birth and evolution. As such, they are one of the few direct measures of the history of cosmic stellar evolution. In this paper we describe an probabilistic technique for identifying supernovae within spectroscopic samples of galaxies. We present a study of 52 type Ia supernovae ranging in age from -14 days to +40 days extracted from a parent sample of \simeq 50,000 spectra from the SDSS DR5. We find a Supernova Rate (SNR) of 0.472^{+0.048}_{-0.039}(Systematic)^{+0.081}_{-0.071}(Statistical)SNu at a redshift of <z> = 0.1. This value is higher than other values at low redshift at the 1σ, but is consistent at the 3σ level. The 52 supernova candidates used in this study comprise the third largest sample of supernovae used in a type Ia rate determination to date. In this paper we demonstrate the potential for the described approach for detecting supernovae in future spectroscopic surveys.

preprint2010arXiv

3D Reconstruction of the Density Field: An SVD Approach to Weak Lensing Tomography

We present a new method for constructing three-dimensional mass maps from gravitational lensing shear data. We solve the lensing inversion problem using truncation of singular values (within the context of generalized least squares estimation) without a priori assumptions about the statistical nature of the signal. This singular value framework allows a quantitative comparison between different filtering methods: we evaluate our method beside the previously explored Wiener filter approaches. Our method yields near-optimal angular resolution of the lensing reconstruction and allows cluster sized halos to be de-blended robustly. It allows for mass reconstructions which are 2-3 orders-of-magnitude faster than the Wiener filter approach; in particular, we estimate that an all-sky reconstruction with arcminute resolution could be performed on a time-scale of hours. We find however that linear, non-parametric reconstructions have a fundamental limitation in the resolution achieved in the redshift direction.

preprint2010arXiv

Affordable Digital Planetariums with WorldWide Telescope

Digital planetariums can provide a broader range of educational experiences than the more classical planetariums that use star-balls. This is because of their ability to project images, content from current research and the 3D distribution of the stars and galaxies. While there are hundreds of planetariums in the country the reason that few of these are full digital is the cost. In collaboration with Microsoft Research (MSR) we have developed a way to digitize existing planetariums for approximately \$40,000 using software freely available. We describe here how off the shelf equipment, together with MSR's WorldWide Telescope client can provide a rich and truly interactive experience. This will enable students and the public to pan though multi-wavelength full-sky scientific data sets, explore 3d visualizations of our Solar System (including trajectories of millions of minor planets), near-by stars, and the SDSS galaxy catalog.

preprint2010arXiv

Astronomy in the Cloud: Using MapReduce for Image Coaddition

In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computation challenges such as anomaly detection and classification, and moving object tracking. Since such studies benefit from the highest quality data, methods such as image coaddition (stacking) will be a critical preprocessing step prior to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources or transient objects, these data streams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this paper we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data is partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud computing resources, e.g., Amazon's EC2. We report on our experience implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multi-terabyte imaging dataset provides a good testbed for algorithm development since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image coaddition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results comparing their performance.

preprint2009arXiv

Probing Spectroscopic Variability of Galaxies & Narrow-Line Active Galactic Nuclei in the Sloan Digital Sky Survey

Under the unified model for active galactic nuclei (AGNs), narrow-line (Type 2) AGNs are, in fact, broad-line (Type 1) AGNs but each with a heavily obscured accretion disk. We would therefore expect the optical continuum emission from Type 2 AGN to be composed mainly of stellar light and non-variable on the time-scales of months to years. In this work we probe the spectroscopic variability of galaxies and narrow-line AGNs using the multi-epoch data in the Sloan Digital Sky Survey (SDSS) Data Release 6. The sample contains 18,435 sources for which there exist pairs of spectroscopic observations (with a maximum separation in time of ~700 days) covering a wavelength range of 3900-8900 angstrom. To obtain a reliable repeatability measurement between each spectral pair, we consider a number of techniques for spectrophotometric calibration resulting in an improved spectrophotometric calibration of a factor of two. From these data we find no obvious continuum and emission-line variability in the narrow-line AGNs on average -- the spectroscopic variability of the continuum is 0.07+/-0.26 mag in the g band and, for the emission-line ratios log10([NII]/Halpha) and log10([OIII]/Hbeta), the variability is 0.02+/-0.03 dex and 0.06+/-0.08 dex, respectively. From the continuum variability measurement we set an upper limit on the ratio between the flux of varying spectral component, presumably related to AGN activities, and that of host galaxy to be ~30%. We provide the corresponding upper limits for other spectral classes, including those from the BPT diagram, eClass galaxy classification, stars and quasars.

preprint2007arXiv

Color Tomography

Lensing tomography with multi-color imaging surveys can probe dark energy and the cosmological power spectrum. However accurate photometric redshifts for tomography out to high redshift require imaging in five or more bands, which is expensive to carry out over thousands of square degrees. Since lensing makes coarse, statistical use of redshift information, we explore the prospects for tomography using limited color information from two or three band imaging. With an appropriate calibration sample, we find that it is feasible to create up to four redshift bins using imaging data in just the g, r and i bands. We construct such redshift sub-samples from mock catalogs by clustering galaxies in color space and discarding regions with poorly-defined redshift distributions. The loss of galaxy number density decreases the accuracy of lensing measurements, but even losing half or more of the galaxies is not a severe loss for large area surveys. We estimate the errors on lensing power spectra and dark energy parameters with color tomography and discuss trade-offs in survey area and filter choice. We discuss the systematic errors that may change our conclusions, especially the information needed to tackle intrinsic alignments.

preprint2005arXiv

Large Scale Clustering of Sloan Digital Sky Survey Quasars: Impact of the Baryon Density and the Cosmological Constant

We report the first result of the clustering analysis of Sloan Digital Sky Survey (SDSS) quasars. We compute the two-point correlation function (2PCF) of SDSS quasars in redshift space at $8h^{-1}{\rm Mpc} < s < 500h^{-1}{\rm Mpc}$, with particular attention to its baryonic signature. Our sample consists of 19986 quasars extracted from the SDSS Data Release 4 (DR4). The redshift range of the sample is $0.72 \le z \le 2.24$ (the mean redshift is $\bar z = 1.46$) and the reddening-corrected $i$-band apparent magnitude range is $15.0 \le m_{i,{\rm rc}} \le 19.1$. Due to the relatively low number density of the quasar sample, the bump in the power spectrum due to the baryon density, $Ω_{\rm b}$, is not clearly visible. The effect of the baryon density is, however, to distort the overall shape of the 2PCF.The degree of distortion makes it an interesting alternate measure of the baryonic signature. Assuming a scale-independent linear bias and the spatially flat universe, i.e., $Ω_{\rm b} + Ω_{\rm d} + Ω_Λ=1$, where $Ω_{\rm d}$ and $Ω_Λ$ denote the density parameters of dark matter and the cosmological constant, we combine the observed quasar 2PCF and the predicted matter 2PCF to put constraints on $Ω_{\rm b}$ and $Ω_Λ$. Our result is fitted as $0.80- 2.8Ω_{\rm b} < Ω_Λ< 0.90 - 1.4Ω_{\rm b}$ at the 2$σ$ confidence level, which is consistent with results from other cosmological observations such as WMAP. (abridged)

preprint2005arXiv

The C4 Clustering Algorithm: Clusters of Galaxies in the Sloan Digital Sky Survey

We present the "C4 Cluster Catalog", a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster--finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects which plagued previous optical clusters selection. The present C4 catalog covers ~2600 square degrees of sky with groups containing 10 members to massive clusters having over 200 cluster members with redshifts. We provide cluster properties like sky location, mean redshift, galaxy membership, summed r--band optical luminosity (L_r), velocity dispersion, and measures of substructure. We use new mock galaxy catalogs to investigate the sensitivity to the various algorithm parameters, as well as to quantify purity and completeness. These mock catalogs indicate that the C4 catalog is ~90% complete and 95% pure above M_200 = 1x10^14 solar masses and within 0.03 <=z <= 0.12. The C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 <= z <= 0.12. We show that the L_r of a cluster is a more robust estimator of the halo mass (M_200) than the line-of-sight velocity dispersion or the richness of the cluster. L_r. The final SDSS data will provide ~2500 C4 clusters and will represent one of the largest and most homogeneous samples of local clusters.

preprint2003arXiv

Hdelta-Selected Galaxies in the Sloan Digital Sky Survey I: The Catalog

[Abridged] We present here a new and homogeneous sample of 3340 galaxies selected from the Sloan Digital Sky Survey (SDSS) based solely on the observed strength of their Hdelta absorption line. These galaxies are commonly known as ``post-starburst'' or ``E+A'' galaxies, and the study of these galaxies has been severely hampered by the lack of a large, statistical sample of such galaxies. In this paper, we rectify this problem by selecting a sample of galaxies which possess an absorption Hdelta equivalent width of EW(Hdelta_max) - Delta EW(Hdelta_max) > 4A from 106682 galaxies in the SDSS. We have performed extensive tests on our catalog including comparing different methodologies of measuring the Hdelta absorption and studying the effects of stellar absorption, dust extinction, emission-filling and measurement error. The measured abundance of our Hdelta-selected (HDS) galaxies is 2.6 +/- 0.1% of all galaxies within a volume-limited sample of 0.05<z<0.1 and M(r*)<-20.5, which is consistent with previous studies of such galaxies in the literature. We find that only 25 of our HDS galaxies in this volume-limited sample (3.5+/-0.7%) show no evidence for OII and Halpha emission, thus indicating that true E+A (or k+a) galaxies are extremely rare objects at low redshift, i.e., only 0.09+/-0.02% of all galaxies in this volume-limited sample are true E+A galaxies. In contrast, 89+/-5% of our HDS galaxies in the volume-limited sample have significant detections of the OII and Halpha emission lines. We find 27 galaxies in our volume-limited HDS sample that possess no detectable OII emission, but do however possess detectable Halpha emission. These galaxies may be dusty star-forming galaxies. We provide the community with this new catalog of Hdelta-selected galaxies to aid in the understanding of these galaxies.

Andrew Connolly

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

The Astronomy Commons Platform: A Deployable Cloud-Based Analysis Platform for Astronomy

Sampling for Deep Learning Model Diagnosis (Technical Report)

Efficient Iterative Processing in the SciDB Parallel Array Engine

Growth of Cosmic Structure: Probing Dark Energy Beyond Expansion

Snowmass Computing Frontier: Computing for the Cosmic Frontier, Astrophysics, and Cosmology

The Multi-Object, Fiber-Fed Spectrographs for SDSS and the Baryon Oscillation Spectroscopic Survey

Interpolating Masked Weak Lensing Signal with Karhunen-Loeve Analysis

Spectroscopic Determination of the Low Redshift Type Ia Supernova Rate from the Sloan Digital Sky Survey

3D Reconstruction of the Density Field: An SVD Approach to Weak Lensing Tomography

Affordable Digital Planetariums with WorldWide Telescope

Astronomy in the Cloud: Using MapReduce for Image Coaddition

Probing Spectroscopic Variability of Galaxies & Narrow-Line Active Galactic Nuclei in the Sloan Digital Sky Survey

Color Tomography

Large Scale Clustering of Sloan Digital Sky Survey Quasars: Impact of the Baryon Density and the Cosmological Constant

The C4 Clustering Algorithm: Clusters of Galaxies in the Sloan Digital Sky Survey

Hdelta-Selected Galaxies in the Sloan Digital Sky Survey I: The Catalog