Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
36works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

36 published item(s)

preprint2025arXiv

Galaxy-Multiplet Clustering from DESI DR2

We present an efficient estimator for higher-order galaxy clustering using small groups of nearby galaxies, or multiplets. Using the Luminous Red Galaxy (LRG) sample from the Dark Energy Spectroscopic Instrument (DESI) Data Release 2, we identify galaxy multiplets as discrete objects and measure their cross-correlations with the general galaxy field. Our results show that the multiplets exhibit stronger clustering bias as they trace more massive dark matter halos than individual galaxies. When comparing the observed clustering statistics with the mock catalogs generated from the N-body simulation AbacusSummit, we find that the mocks underpredict multiplet clustering despite reproducing the galaxy two-point auto-correlation reasonably well. This discrepancy indicates that the standard Halo Occupation Distribution (HOD) model is insufficient to describe the properties of galaxy multiplets, revealing the greater constraining power of this higher-order statistic on galaxy-halo connection and the possibility that multiplets are specific to additional assembly bias. We demonstrate that incorporating secondary biases into the HOD model improves agreement with the observed multiplet statistics, specifically by allowing galaxies to preferentially occupy halos in denser environments. Our results highlight the potential of utilizing multiplet clustering, beyond traditional two-point correlation measurements, to break degeneracies in models describing the galaxy-dark matter connection.

preprint2024arXiv

The Gravitational Lensing Imprints of DES Y3 Superstructures on the CMB: A Matched Filtering Approach

$ $Low density cosmic voids gravitationally lens the cosmic microwave background (CMB), leaving a negative imprint on the CMB convergence $κ$. This effect provides insight into the distribution of matter within voids, and can also be used to study the growth of structure. We measure this lensing imprint by cross-correlating the Planck CMB lensing convergence map with voids identified in the Dark Energy Survey Year 3 data set, covering approximately 4,200 deg$^2$ of the sky. We use two distinct void-finding algorithms: a 2D void-finder which operates on the projected galaxy density field in thin redshift shells, and a new code, Voxel, which operates on the full 3D map of galaxy positions. We employ an optimal matched filtering method for cross-correlation, using the MICE N-body simulation both to establish the template for the matched filter and to calibrate detection significances. Using the DES Y3 photometric luminous red galaxy sample, we measure $A_κ$, the amplitude of the observed lensing signal relative to the simulation template, obtaining $A_κ= 1.03 \pm 0.22$ ($4.6σ$ significance) for Voxel and $A_κ= 1.02 \pm 0.17$ ($5.9σ$ significance) for 2D voids, both consistent with $Λ$CDM expectations. We additionally invert the 2D void-finding process to identify superclusters in the projected density field, for which we measure $A_κ= 0.87 \pm 0.15$ ($5.9σ$ significance). The leading source of noise in our measurements is Planck noise, implying that future data from the Atacama Cosmology Telescope (ACT), South Pole Telescope (SPT) and CMB-S4 will increase sensitivity and allow for more precise measurements.

preprint2023arXiv

The Target-selection Pipeline for the Dark Energy Spectroscopic Instrument

In 2021 May, the Dark Energy Spectroscopic Instrument (DESI) began a 5 yr survey of approximately 50 million total extragalactic and Galactic targets. The primary DESI dark-time targets are emission line galaxies (ELGs), luminous red galaxies (LRGs) and quasars (QSOs). In bright time, DESI will focus on two surveys known as the Bright Galaxy Survey (BGS) and the Milky Way Survey (MWS). DESI also observes a selection of "secondary" targets for bespoke science goals. This paper gives an overview of the publicly available pipeline (desitarget) used to process targets for DESI observations. Highlights include details of the different DESI survey targeting phases, the targeting ID (TARGETID) used to define unique targets, the bitmasks used to indicate a particular type of target, the data model and structure of DESI targeting files, and examples of how to access and use the desitarget code base. This paper will also describe "supporting" DESI target classes, such as standard stars, sky locations, and random catalogs that mimic the angular selection function of DESI targets. The DESI target selection pipeline is complex and sizable; this paper attempts to summarize the most salient information required to understand and work with DESI targeting data.

preprint2022arXiv

A Spectroscopic Road Map for Cosmic Frontier: DESI, DESI-II, Stage-5

In this white paper, we present an experimental road map for spectroscopic experiments beyond DESI. DESI will be a transformative cosmological survey in the 2020s, mapping 40 million galaxies and quasars and capturing a significant fraction of the available linear modes up to z=1.2. DESI-II will pilot observations of galaxies both at much higher densities and extending to higher redshifts. A Stage-5 experiment would build out those high-density and high-redshift observations, mapping hundreds of millions of stars and galaxies in three dimensions, to address the problems of inflation, dark energy, light relativistic species, and dark matter. These spectroscopic data will also complement the next generation of weak lensing, line intensity mapping and CMB experiments and allow them to reach their full potential.

preprint2022arXiv

Angular clustering properties of the DESI QSO target selection using DR9 Legacy Imaging Surveys

The quasar target selection for the upcoming survey of the Dark Energy Spectroscopic Instrument (DESI) will be fixed for the next five years. The aim of this work is to validate the quasar selection by studying the impact of imaging systematics as well as stellar and galactic contaminants, and to develop a procedure to mitigate them. Density fluctuations of quasar targets are found to be related to photometric properties such as seeing and depth of the Data Release 9 of the DESI Legacy Imaging Surveys. To model this complex relation, we explore machine learning algorithms (Random Forest and Multi-Layer Perceptron) as an alternative to the standard linear regression. Splitting the footprint of the Legacy Imaging Surveys into three regions according to photometric properties, we perform an independent analysis in each region, validating our method using eBOSS EZ-mocks. The mitigation procedure is tested by comparing the angular correlation of the corrected target selection on each photometric region to the angular correlation function obtained using quasars from the Sloan Digital Sky Survey (SDSS)Data Release 16. With our procedure, we recover a similar level of correlation between DESI quasar targets and SDSS quasars in two thirds of the total footprint and we show that the excess of correlation in the remaining area is due to a stellar contamination which should be removed with DESI spectroscopic data. We derive the Limber parameters in our three imaging regions and compare them to previous measurements from SDSS and the 2dF QSO Redshift Survey.

preprint2022arXiv

BioSimulators: a central registry of simulation engines and services for recommending specific tools

Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find and use simulation tools, we developed BioSimulators (https://biosimulators.org), a central registry of the capabilities of simulation tools and consistent Python, command-line, and containerized interfaces to each version of each tool. The foundation of BioSimulators is standards, such as CellML, SBML, SED-ML, and the COMBINE archive format, and validation tools for simulation projects and simulation tools that ensure these standards are used consistently. To help modelers find tools for particular projects, we have also used the registry to develop recommendation services. We anticipate that BioSimulators will help modelers exchange, reproduce, and combine simulations.

preprint2022arXiv

Cosmological constraints from the tomographic cross-correlation of DESI Luminous Red Galaxies and Planck CMB lensing

We use luminous red galaxies selected from the imaging surveys that are being used for targeting by the Dark Energy Spectroscopic Instrument (DESI) in combination with CMB lensing maps from the Planck collaboration to probe the amplitude of large-scale structure over $0.4\le z\le 1$. Our galaxy sample, with an angular number density of approximately $500\,\mathrm{deg}^{-2}$ over 18,000 sq.deg., is divided into 4 tomographic bins by photometric redshift and the redshift distributions are calibrated using spectroscopy from DESI. We fit the galaxy autospectra and galaxy-convergence cross-spectra using models based on cosmological perturbation theory, restricting to large scales that are expected to be well described by such models. Within the context of $Λ$CDM, combining all 4 samples and using priors on the background cosmology from supernova and baryon acoustic oscillation measurements, we find $S_8=σ_8(Ω_m/0.3)^{0.5}=0.73\pm 0.03$. This result is lower than the prediction of the $Λ$CDM model conditioned on the Planck data. Our data prefer a slower growth of structure at low redshift than the model predictions, though at only modest significance.

preprint2022arXiv

Deep Learning of DESI Mock Spectra to Find Damped Lyα Systems

We have updated and applied a convolutional neural network (CNN) machine learning model to discover and characterize damped Ly$α$ systems (DLAs) based on Dark Energy Spectroscopic Instrument (DESI) mock spectra. We have optimized the training process and constructed a CNN model that yields a DLA classification accuracy above 99$\%$ for spectra which have signal-to-noise (S/N) above 5 per pixel. Classification accuracy is the rate of correct classifications. This accuracy remains above 97$\%$ for lower signal-to-noise (S/N) $\approx1$ spectra. This CNN model provides estimations for redshift and HI column density with standard deviations of 0.002 and 0.17 dex for spectra with S/N above 3 per pixel. Also, this DLA finder is able to identify overlapping DLAs and sub-DLAs. Further, the impact of different DLA catalogs on the measurement of Baryon Acoustic Oscillation (BAO) is investigated. The cosmological fitting parameter result for BAO has less than $0.61\%$ difference compared to analysis of the mock results with perfect knowledge of DLAs. This difference is lower than the statistical error for the first year estimated from the mock spectra: above $1.7\%$. We also compared the performance of CNN and Gaussian Process (GP) model. Our improved CNN model has moderately 14$\%$ higher purity and 7$\%$ higher completeness than an older version of GP code, for S/N $>$ 3. Both codes provide good DLA redshift estimates, but the GP produces a better column density estimate by $24\%$ less standard deviation. A credible DLA catalog for DESI main survey can be provided by combining these two algorithms.

preprint2022arXiv

Impala: Low-Latency, Communication-Efficient Private Deep Learning Inference

This paper proposes Impala, a new cryptographic protocol for private inference in the client-cloud setting. Impala builds upon recent solutions that combine the complementary strengths of homomorphic encryption (HE) and secure multi-party computation (MPC). A series of protocol optimizations are developed to reduce both communication and performance bottlenecks. First, we remove MPC's overwhelmingly high communication cost from the client by introducing a proxy server and developing a low-overhead key switching technique. Key switching reduces the clients bandwidth by multiple orders of magnitude, however the communication between the proxy and cloud is still excessive. Second, to we develop an optimized garbled circuit that leverages truncated secret shares for faster evaluation and less proxy-cloud communication. Finally, we propose sparse HE convolution to reduce the computational bottleneck of using HE. Compared to the state-of-the-art, these optimizations provide a bandwidth savings of over 3X and speedup of 4X for private deep learning inference.

preprint2022arXiv

NVMExplorer: A Framework for Cross-Stack Comparisons of Embedded Non-Volatile Memories

Repeated off-chip memory accesses to DRAM drive up operating power for data-intensive applications, and SRAM technology scaling and leakage power limits the efficiency of embedded memories. Future on-chip storage will need higher density and energy efficiency, and the actively expanding field of emerging, embeddable non-volatile memory (eNVM) technologies is providing many potential candidates to satisfy this need. Each technology proposal presents distinct trade-offs in terms of density, read, write, and reliability characteristics, and we present a comprehensive framework for navigating and quantifying these design trade-offs alongside realistic system constraints and application-level impacts. This work evaluates eNVM-based storage for a range of application and system contexts including machine learning on the edge, graph analytics, and general purpose cache hierarchy, in addition to describing a freely available (http://nvmexplorer.seas.harvard.edu/) set of tools for application experts, system designers, and device experts to better understand, compare, and quantify the next generation of embedded memory solutions.

preprint2022arXiv

OMU: A Probabilistic 3D Occupancy Mapping Accelerator for Real-time OctoMap at the Edge

Autonomous machines (e.g., vehicles, mobile robots, drones) require sophisticated 3D mapping to perceive the dynamic environment. However, maintaining a real-time 3D map is expensive both in terms of compute and memory requirements, especially for resource-constrained edge machines. Probabilistic OctoMap is a reliable and memory-efficient 3D dense map model to represent the full environment, with dynamic voxel node pruning and expansion capacity. This paper presents the first efficient accelerator solution, i.e. OMU, to enable real-time probabilistic 3D mapping at the edge. To improve the performance, the input map voxels are updated via parallel PE units for data parallelism. Within each PE, the voxels are stored using a specially developed data structure in parallel memory banks. In addition, a pruning address manager is designed within each PE unit to reuse the pruned memory addresses. The proposed 3D mapping accelerator is implemented and evaluated using a commercial 12 nm technology. Compared to the ARM Cortex-A57 CPU in the Nvidia Jetson TX2 platform, the proposed accelerator achieves up to 62$\times$ performance and 708$\times$ energy efficiency improvement. Furthermore, the accelerator provides 63 FPS throughput, more than 2$\times$ higher than a real-time requirement, enabling real-time perception for 3D mapping.

preprint2022arXiv

SOAR/Goodman Spectroscopic Assessment of Candidate Counterparts of the LIGO-Virgo Event GW190814

On 2019 August 14 at 21:10:39 UTC, the LIGO/Virgo Collaboration (LVC) detected a possible neutron star-black hole merger (NSBH), the first ever identified. An extensive search for an optical counterpart of this event, designated GW190814, was undertaken using the Dark Energy Camera (DECam) on the 4m Victor M. Blanco Telescope at the Cerro Tololo Inter-American Observatory. Target of Opportunity interrupts were issued on 8 separate nights to observe 11 candidates using the 4.1m Southern Astrophysical Research (SOAR) telescope's Goodman High Throughput Spectrograph in order to assess whether any of these transients was likely to be an optical counterpart of the possible NSBH merger. Here, we describe the process of observing with SOAR, the analysis of our spectra, our spectroscopic typing methodology, and our resultant conclusion that none of the candidates corresponded to the gravitational wave merger event but were all instead other transients. Finally, we describe the lessons learned from this effort. Application of these lessons will be critical for a successful community spectroscopic follow-up program for LVC observing run 4 (O4) and beyond.

preprint2022arXiv

Sustainable AI: Environmental Implications, Challenges and Opportunities

This paper explores the environmental impact of the super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the carbon footprint of AI computing by examining the model development cycle across industry-scale machine learning use cases and, at the same time, considering the life cycle of system hardware. Taking a step further, we capture the operational and manufacturing carbon footprint of AI computing and present an end-to-end analysis for what and how hardware-software design and at-scale optimization can help reduce the overall carbon footprint of AI. Based on the industry experience and lessons learned, we share the key challenges and chart out important development directions across the many dimensions of AI. We hope the key messages and insights presented in this paper can inspire the community to advance the field of AI in an environmentally-responsible manner.

preprint2022arXiv

The DESI $N$-body Simulation Project -- II. Suppressing sample variance with fast simulations

Dark Energy Spectroscopic Instrument (DESI) will construct a large and precise three-dimensional map of our Universe. The survey effective volume reaches $\sim20\Gpchcube$. It is a great challenge to prepare high-resolution simulations with a much larger volume for validating the DESI analysis pipelines. \textsc{AbacusSummit} is a suite of high-resolution dark-matter-only simulations designed for this purpose, with $200\Gpchcube$ (10 times DESI volume) for the base cosmology. However, further efforts need to be done to provide a more precise analysis of the data and to cover also other cosmologies. Recently, the CARPool method was proposed to use paired accurate and approximate simulations to achieve high statistical precision with a limited number of high-resolution simulations. Relying on this technique, we propose to use fast quasi-$N$-body solvers combined with accurate simulations to produce accurate summary statistics. This enables us to obtain 100 times smaller variance than the expected DESI statistical variance at the scales we are interested in, e.g. $k < 0.3\hMpc$ for the halo power spectrum. In addition, it can significantly suppress the sample variance of the halo bispectrum. We further generalize the method for other cosmologies with only one realization in \textsc{AbacusSummit} suite to extend the effective volume $\sim 20$ times. In summary, our proposed strategy of combining high-fidelity simulations with fast approximate gravity solvers and a series of variance suppression techniques sets the path for a robust cosmological analysis of galaxy survey data.

preprint2022arXiv

The MegaMapper: A Stage-5 Spectroscopic Instrument Concept for the Study of Inflation and Dark Energy

In this white paper, we present the MegaMapper concept. The MegaMapper is a proposed ground-based experiment to measure Inflation parameters and Dark Energy from galaxy redshifts at $2<z<5$. In order to achieve path-breaking results with a mid-scale investment, the MegaMapper combines existing technologies for critical path elements and pushes innovative development in other design areas. To this aim, we envision a 6.5-m Magellan-like telescope, with a newly designed wide field, coupled with DESI spectrographs, and small-pitch robots to achieve multiplexing of at least 26,000. This will match the expected achievable target density in the redshift range of interest and provide a 10x capability over the existing state-of the art, without a 10x increase in project budget.

preprint2022arXiv

Trireme: Exploring Hierarchical Multi-Level Parallelism for Domain Specific Hardware Acceleration

The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level and pipeline parallelism. To assist the design process and expose every possible level of parallelism, we present Trireme, a fully automated tool-chain that explores multiple levels of parallelism and produces domain specific accelerator designs and configurations that maximize performance, given an area budget. Experiments on demanding benchmarks from the XR domain revealed a speedup of up to 20x, as well as a speedup of up to 37x for smaller applications, compared to software-only implementations.

preprint2021arXiv

Installation of the Dark Energy Spectroscopic Instrument at the Mayall 4-meter telescope

The Dark Energy Spectroscopic Instrument (DESI) is a Stage IV ground-based dark energy experiment that will measure the expansion history of the Universe using the Baryon Acoustic Oscillation technique. The spectra of 35 million galaxies and quasars over 14000 square degrees will be measured during the life of the experiment. We describe the installation of the major elements of the instrument at the Mayall 4m telescope, completed in late 2019. The previous prime focus corrector, spider vanes, and upper rings were removed from the Mayall&#39;s Serrurier truss and replaced with the newly-constructed DESI ring, vanes, cage, hexapod, and optical corrector. The new corrector was optically aligned with the primary mirror using a laser tracker system. The DESI focal plane system was integrated to the corrector, with each of its ten 500-fiber-positioner petal segments installed using custom installation hardware and the laser tracker. Ten DESI spectrographs with 30 cryostats were installed in a newly assembled clean room in the Large Coude Room. The ten cables carrying 5000 optical fibers from the positioners in the focal plane were routed down the telescope through cable wraps at the declination and hour angle axes, and their integral slitheads were integrated with the ten spectrographs. The fiber view camera assembly was installed to the Mayall&#39;s primary mirror cell. Servers for the instrument control system replaced existing computer equipment. The fully integrated instrument has been commissioned and is ready to start its operations phase.

preprint2021arXiv

Logic Compatible High-Performance Ferroelectric Transistor Memory

Silicon ferroelectric field-effect transistors (FeFETs) with low-k interfacial layer (IL) between ferroelectric gate stack and silicon channel suffers from high write voltage, limited write endurance and large read-after-write latency due to early IL breakdown and charge trapping and detrapping at the interface. We demonstrate low voltage, high speed memory operation with high write endurance using an IL-free back-end-of-line (BEOL) compatible FeFET. We fabricate IL-free FeFETs with 28nm channel length and 126nm width under a thermal budget <400C by integrating 5nm thick Hf0.5Zr0.5O2 gate stack with amorphous Indium Tungsten Oxide (IWO) semiconductor channel. We report 1.2V memory window and read current window of 10^5 for program and erase, write latency of 20ns with +/-2V write pulses, read-after-write latency <200ns, write endurance cycles exceeding 5x10^10 and 2-bit/cell programming capability. Array-level analysis establishes IL-free BEOL FeFET as a promising candidate for logic-compatible high-performance on-chip buffer memory and multi-bit weight cell for compute-in-memory accelerators.

preprint2021arXiv

Performance of Kitt Peak&#39;s Mayall 4-meter Telescope During DESI Commissioning

In preparation for the Dark Energy Spectroscopic Instrument (DESI), a new top end was installed on the Mayall 4-meter telescope at Kitt Peak National Observatory. The refurbished telescope and the DESI instrument were successfully commissioned on sky between 2019 October and 2020 March. Here we describe the pointing, tracking and imaging performance of the Mayall telescope equipped with its new DESI prime focus corrector, as measured by six guider cameras sampling the outer edge of DESI&#39;s focal plane. Analyzing ~500,000 guider images acquired during commissioning, we find a median delivered image FWHM of 1.1 arcseconds (in the r-band at 650 nm), with the distribution extending to a best-case value of ~0.6 arcseconds. The point spread function is well characterized by a Moffat profile with a power-law index of $β$ ~ 3.5 and little dependence of $β$ on FWHM. The shape and size of the PSF delivered by the new corrector at a field angle of 1.57 degrees are very similar to those measured with the old Mayall corrector on axis. We also find that the Mayall achieves excellent pointing accuracy (several arcseconds RMS) and minimal open-loop tracking drift (< 1 milliarcsecond per second), improvements on the telecope&#39;s pre-DESI performance. In the future, employing DESI&#39;s active focus adjustment capabilities will likely further improve the Mayall/DESI delivered image quality.

preprint2021arXiv

RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters requiring large memory capacities. Unfortunately, large and fast DRAM-based memories levy high infrastructure costs. Conventional SSD-based storage solutions offer an order of magnitude larger capacity, but have worse read latency and bandwidth, degrading inference performance. RecSSD is a near data processing based SSD memory system customized for neural recommendation inference that reduces end-to-end model inference latency by 2X compared to using COTS SSDs across eight industry-representative models.

preprint2021arXiv

The DESI $N$-body Simulation Project I: Testing the Robustness of Simulations for the DESI Dark Time Survey

Analysis of large galaxy surveys requires confidence in the robustness of numerical simulation methods. The simulations are used to construct mock galaxy catalogs to validate data analysis pipelines and identify potential systematics. We compare three $N$-body simulation codes, ABACUS, GADGET, and SWIFT, to investigate the regimes in which their results agree. We run $N$-body simulations at three different mass resolutions, $6.25\times10^{8}$, $2.11\times10^{9}$, and $5.00\times10^{9}~h^{-1}$M$_{\odot}$, matching phases to reduce the noise within the comparisons. We find systematic errors in the halo clustering between different codes are smaller than the DESI statistical error for $s > 20\, h^{-1}$Mpc in the correlation function in redshift space. Through the resolution comparison we find that simulations run with a mass resolution of $2.1\times10^{9}~h^{-1}$M$_{\odot}$ are sufficiently converged for systematic effects in the halo clustering to be smaller than the DESI statistical error at scales larger than $20 \, h^{-1}$Mpc. These findings show that the simulations are robust for extracting cosmological information from large scales which is the key goal of the DESI survey. Comparing matter power spectra, we find the codes agree to within 1% for $k \leq 10~h$Mpc$^{-1}$. We also run a comparison of three initial condition generation codes and find good agreement. In addition, we include a quasi-$N$-body code, FastPM, since we plan use it for certain DESI analyses. The impact of the halo definition and galaxy-halo relation will be presented in a follow up study.

preprint2021arXiv

The DESI Sky Continuum Monitor System

The Dark Energy Spectroscopic Instrument (DESI) is an ongoing spectroscopic survey to measure the dark energy equation of state to unprecedented precision. We describe the DESI Sky Continuum Monitor System, which tracks the night sky brightness as part of a system that dynamically adjusts the spectroscopic exposure time to produce more uniform data quality and to maximize observing efficiency. The DESI dynamic exposure time calculator (ETC) will combine sky brightness measurements from the Sky Monitor with data from the guider system to calculate the exposure time to achieve uniform signal-to-noise ratio (SNR) in the spectra under various observing conditions. The DESI design includes 20 sky fibers, and these are split between two identical Sky Monitor units to provide redundancy. Each Sky Monitor unit uses an SBIG STXL-6303e CCD camera and supports an eight-position filter wheel. Both units have been completed and delivered to the Mayall Telescope at the Kitt Peak National Observatory. Commissioning results show that the Sky Monitor delivers the required performance necessary for the ETC.

preprint2020arXiv

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference

Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally clips its available dynamic range, at a layer granularity, in order to create faithful encoding of neural network parameters. AdaptivFloat consistently produces higher inference accuracies compared to block floating-point, uniform, IEEE-like float or posit encodings at very low precision ($\leq$ 8-bit) across a diverse set of state-of-the-art neural network topologies. And notably, AdaptivFloat is seen surpassing baseline FP32 performance by up to +0.3 in BLEU score and -0.75 in word error rate at weight bit widths that are $\leq$ 8-bit. Experimental results on a deep neural network (DNN) hardware accelerator, exploiting AdaptivFloat logic in its computational datapath, demonstrate per-operation energy and area that is 0.9$\times$ and 1.14$\times$, respectively, that of equivalent bit width integer-based accelerator variants.

preprint2020arXiv

CHIPKIT: An agile, reusable open-source framework for rapid test chip development

The current trend for domain-specific architectures (DSAs) has led to renewed interest in research test chips to demonstrate new specialized hardware. Tape-outs also offer huge pedagogical value garnered from real hands-on exposure to the whole system stack. However, successful tape-outs demand hard-earned experience, and the design process is time consuming and fraught with challenges. Therefore, custom chips have remained the preserve of a small number of research groups, typically focused on circuit design research. This paper describes the CHIPKIT framework. We describe a reusable SoC subsystem which provides basic IO, an on-chip programmable host, memory and peripherals. This subsystem can be readily extended with new IP blocks to generate custom test chips. We also present an agile RTL development flow, including a code generation tool calledVGEN. Finally, we outline best practices for full-chip validation across the entire design cycle.

preprint2020arXiv

Dark Energy Survey Identification of A Low-Mass Active Galactic Nucleus at Redshift 0.823 from Optical Variability

We report the identification of a low-mass AGN, DES J0218$-$0430, in a redshift $z = 0.823$ galaxy in the Dark Energy Survey (DES) Supernova field. We select DES J0218$-$0430 as an AGN candidate by characterizing its long-term optical variability alone based on DES optical broad-band light curves spanning over 6 years. An archival optical spectrum from the fourth phase of the Sloan Digital Sky Survey shows both broad Mg II and broad H$β$ lines, confirming its nature as a broad-line AGN. Archival XMM-Newton X-ray observations suggest an intrinsic hard X-ray luminosity of $L_{\rm 2-12\,keV}\sim7.6\pm0.4\times10^{43}$ erg s$^{-1}$, which exceeds those of the most X-ray luminous starburst galaxies, in support of an AGN driving the optical variability. Based on the broad H$β$ from SDSS spectrum, we estimate a virial BH mass of $M_{\bullet}\approx10^{6.43}$-$10^{6.72}M_{\odot}$ (with the error denoting 1$σ$ statistical uncertainties only), consistent with the estimation from OzDES, making it the lowest mass AGN with redshift $>$ 0.4 detected in optical. We estimate the host galaxy stellar mass to be $M_{\ast}\sim10^{10.5\pm0.3}M_{\odot}$ based on modeling the multi-wavelength spectral energy distribution. DES J0218$-$0430 extends the $M_{\bullet}$-$M_{\ast}$ relation observed in luminous AGNs at $z\sim1$ to masses lower than being probed by previous work. Our work demonstrates the feasibility of using optical variability to identify low-mass AGNs at higher redshift in deeper synoptic surveys with direct implications for the upcoming Legacy Survey of Space and Time at Vera C. Rubin Observatory.

preprint2020arXiv

Dark Energy Survey Year 1 Results: Cosmological Constraints from Cluster Abundances and Weak Lensing

We perform a joint analysis of the counts and weak lensing signal of redMaPPer clusters selected from the Dark Energy Survey (DES) Year 1 dataset. Our analysis uses the same shear and source photometric redshifts estimates as were used in the DES combined probes analysis. Our analysis results in surprisingly low values for $S_8 =σ_8(Ω_{\rm m}/0.3)^{0.5}= 0.65\pm 0.04$, driven by a low matter density parameter, $Ω_{\rm m}=0.179^{+0.031}_{-0.038}$, with $σ_8-Ω_{\rm m}$ posteriors in $2.4σ$ tension with the DES Y1 3x2pt results, and in $5.6σ$ with the Planck CMB analysis. These results include the impact of post-unblinding changes to the analysis, which did not improve the level of consistency with other data sets compared to the results obtained at the unblinding. The fact that multiple cosmological probes (supernovae, baryon acoustic oscillations, cosmic shear, galaxy clustering and CMB anisotropies), and other galaxy cluster analyses all favor significantly higher matter densities suggests the presence of systematic errors in the data or an incomplete modeling of the relevant physics. Cross checks with X-ray and microwave data, as well as independent constraints on the observable--mass relation from SZ selected clusters, suggest that the discrepancy resides in our modeling of the weak lensing signal rather than the cluster abundance. Repeating our analysis using a higher richness threshold ($λ\ge 30$) significantly reduces the tension with other probes, and points to one or more richness-dependent effects not captured by our model.

preprint2020arXiv

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

Neural personalized recommendation is the corner-stone of a wide collection of cloud services and products, constituting significant compute demand of the cloud infrastructure. Thus, improving the execution efficiency of neural recommendation directly translates into infrastructure capacity saving. In this paper, we devise a novel end-to-end modeling infrastructure, DeepRecInfra, that adopts an algorithm and system co-design methodology to custom-design systems for recommendation use cases. Leveraging the insights from the recommendation characterization, a new dynamic scheduler, DeepRecSched, is proposed to maximize latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, recommendation model architectures, and underlying hardware systems. By doing so, system throughput is doubled across the eight industry-representative recommendation models. Finally, design, deployment, and evaluation in at-scale production datacenter shows over 30% latency reduction across a wide variety of recommendation models running on hundreds of machines.

preprint2020arXiv

Dust Reverberation Mapping in Distant Quasars from Optical and Mid-Infrared Imaging Surveys

The size of the dust torus in Active Galactic Nuclei (AGN) and their high-luminosity counterparts, quasars, can be inferred from the time delay between UV/optical accretion disk continuum variability and the response in the mid-infrared (MIR) torus emission. This dust reverberation mapping (RM) technique has been successfully applied to $\sim 70$ $z\lesssim 0.3$ AGN and quasars. Here we present first results of our dust RM program for distant quasars covered in the SDSS Stripe 82 region combining $\sim 20$-yr ground-based optical light curves with 10-yr MIR light curves from the WISE satellite. We measure a high-fidelity lag between W1-band (3.4 $μ$m) and $g$ band for 587 quasars over $0.3\lesssim z\lesssim 2$ ($\left<z\right>\sim 0.8$) and two orders of magnitude in quasar luminosity. They tightly follow (intrinsic scatter $\sim 0.17$ dex in lag) the IR lag-luminosity relation observed for $z<0.3$ AGN, revealing a remarkable size-luminosity relation for the dust torus over more than four decades in AGN luminosity, with little dependence on additional quasar properties such as Eddington ratio and variability amplitude. This study motivates further investigations in the utility of dust RM for cosmology, and strongly endorses a compelling science case for the combined 10-yr Vera C. Rubin Observatory Legacy Survey of Space and Time (optical) and 5-yr Nancy Grace Roman Space Telescope 2$μ$m light curves in a deep survey for low-redshift AGN dust RM with much lower luminosities and shorter, measurable IR lags. The compiled optical and MIR light curves for 7,384 quasars in our parent sample are made public with this work.

preprint2020arXiv

Exploiting Parallelism Opportunities with Deep Learning Frameworks

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This paper takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.29x and 1.34x, respectively.

preprint2020arXiv

Imaging Systematics and Clustering of DESI Main Targets

We evaluate the impact of imaging systematics on the clustering of luminous red galaxies (LRG), emission-line galaxies (ELG) and quasars (QSO) targeted for the upcoming Dark Energy Spectroscopic Instrument (DESI) survey. Using Data Release 7 of the DECam Legacy Survey, we study the effects of astrophysical foregrounds, stellar contamination, differences between north galactic cap and south galactic cap measurements, and variations in imaging depth, stellar density, galactic extinction, seeing, airmass, sky brightness, and exposure time before presenting survey masks and weights to mitigate these effects. With our sanitized samples in hand, we conduct a preliminary analysis of the clustering amplitude and evolution of the DESI main targets. From measurements of the angular correlation functions, we determine power law fits $r_0 = 7.78 \pm 0.26$ $h^{-1}$Mpc, $γ= 1.98 \pm 0.02$ for LRGs and $r_0 = 5.45 \pm 0.1$ $h^{-1}$Mpc, $γ= 1.54 \pm 0.01$ for ELGs. Additionally, from the angular power spectra, we measure the linear biases and model the scale dependent biases in the weakly nonlinear regime. Both sets of clustering measurements show good agreement with survey requirements for LRGs and ELGs, attesting that these samples will enable DESI to achieve precise cosmological constraints. We also present clustering as a function of magnitude, use cross-correlations with external spectroscopy to infer $dN/dz$ and measure clustering as a function of luminosity, and probe higher order clustering statistics through counts-in-cells moments.

preprint2020arXiv

MLPerf Training Benchmark

Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from other domains: optimizations that improve training throughput can increase the time to solution, training is stochastic and time to solution exhibits high variance, and software and hardware systems are so diverse that fair benchmarking with the same binary, code, and even hyperparameters is difficult. We therefore present MLPerf, an ML benchmark that overcomes these challenges. Our analysis quantitatively evaluates MLPerf&#39;s efficacy at driving performance and scalability improvements across two rounds of results from multiple vendors.

preprint2020arXiv

Optimising Automatic Morphological Classification of Galaxies with Machine Learning and Deep Learning using Dark Energy Survey Imaging

There are several supervised machine learning methods used for the application of automated morphological classification of galaxies; however, there has not yet been a clear comparison of these different methods using imaging data, or a investigation for maximising their effectiveness. We carry out a comparison between several common machine learning methods for galaxy classification (Convolutional Neural Network (CNN), K-nearest neighbour, Logistic Regression, Support Vector Machine, Random Forest, and Neural Networks) by using Dark Energy Survey (DES) data combined with visual classifications from the Galaxy Zoo 1 project (GZ1). Our goal is to determine the optimal machine learning methods when using imaging data for galaxy classification. We show that CNN is the most successful method of these ten methods in our study. Using a sample of $\sim$2,800 galaxies with visual classification from GZ1, we reach an accuracy of $\sim$0.99 for the morphological classification of Ellipticals and Spirals. The further investigation of the galaxies that have a different ML and visual classification but with high predicted probabilities in our CNN usually reveals an the incorrect classification provided by GZ1. We further find the galaxies having a low probability of being either spirals or ellipticals are visually Lenticulars (S0), demonstrating that supervised learning is able to rediscover that this class of galaxy is distinct from both Es and Spirals. We confirm that $\sim$2.5\% galaxies are misclassified by GZ1 in our study. After correcting these galaxies&#39; labels, we improve our CNN performance to an average accuracy of over 0.99 (accuracy of 0.994 is our best result).

preprint2020arXiv

The Architectural Implications of Facebook&#39;s DNN-based Personalized Recommendation

The widespread application of deep learning has changed the landscape of computation in the data center. In particular, personalized recommendation for content ranking is now largely accomplished leveraging deep neural networks. However, despite the importance of these models and the amount of compute cycles they consume, relatively little research attention has been devoted to systems for recommendation. To facilitate research and to advance the understanding of these workloads, this paper presents a set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct in-depth analysis that underpins future system design and optimization for at-scale recommendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inferences can drastically improve latency-bounded throughput, and the diverse composition of recommendation models leads to different optimization strategies.

preprint2020arXiv

The Curious Case of PHL 293B: A Long-Lived Transient in a Metal-Poor Blue Compact Dwarf Galaxy

We report on small-amplitude optical variability and recent dissipation of the unusually persistent broad emission lines in the blue compact dwarf galaxy PHL 293B. The galaxy&#39;s unusual spectral features (P Cygni-like profiles with $\sim$800 km s$^{-1}$ blueshifted absorption lines) have resulted in conflicting interpretations of the nature of this source in the literature. However, analysis of new Gemini spectroscopy reveals the broad emission has begun to fade after being persistent for over a decade prior. Precise difference imaging light curves constructed with the Sloan Digital Sky Survey and the Dark Energy Survey reveal small-amplitude optical variability of $\sim$0.1 mag in the g band offset by $100\pm21$ pc from the brightest pixel of the host. The light curve is well-described by an active galactic nuclei (AGN)-like damped random walk process. However, we conclude that the origin of the optical variability and spectral features of PHL 293B is due to a long-lived stellar transient, likely a Type IIn supernova or non-terminal outburst, mimicking long-term AGN-like variability. This work highlights the challenges of discriminating between scenarios in such extreme environments, relevant to searches for AGNs in dwarf galaxies. This is the second long-lived transient discovered in a blue compact dwarf, after SDSS1133. Our result implies such long-lived stellar transients may be more common in metal-deficient galaxies. Systematic searches for low-level variability in dwarf galaxies will be possible with the upcoming Legacy Survey of Space and Time at Vera C. Rubin Observatory.

preprint2019arXiv

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference. The in-depth characterization of production-grade recommendation models shows that embedding operations with high model-, operator- and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. We propose RecNMP which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models. RecNMP is specifically tailored to production environments with heavy co-location of operators on a single server. Several hardware/software co-optimization techniques such as memory-side caching, table-aware packet scheduling, and hot entry profiling are studied, resulting in up to 9.8x memory latency speedup over a highly-optimized baseline. Overall, RecNMP offers 4.2x throughput improvement and 45.8% memory energy savings.

preprint2019arXiv

Spectral Variability of a Sample of Extreme Variability Quasars and Implications for the MgII Broad-line Region

We present new Gemini/GMOS optical spectroscopy of 16 extreme variability quasars (EVQs) that dimmed by more than 1.5 mag in the $g$ band between the Sloan Digital Sky Survey (SDSS) and the Dark Energy Survey (DES) epochs (separated by a few years in the quasar rest frame). The quasar sample covers a redshift range of $0.5 < z < 2.1$. Nearly half of these EVQs brightened significantly (by more than 0.5 mag in the $g$ band) in a few years after reaching their previous faintest state, and some EVQs showed rapid (non-blazar) variations of greater than 1-2 mag on timescales of only months. Leveraging on the large dynamic range in continuum variability between the earlier SDSS and the new GMOS spectra, we explore the associated variations in the broad Mg II,$\lambda2798$ line, whose variability properties have not been well studied before. The broad Mg II flux varies in the same direction as the continuum flux, albeit with a smaller amplitude, which indicates at least some portion of Mg II is reverberating to continuum changes. However, the width (FWHM) of Mg II does not vary accordingly as continuum changes for most objects in the sample, in contrast to the case of the broad Balmer lines. Using the width of broad Mg II to estimate the black hole mass therefore introduces a luminosity-dependent bias.