Researcher profile

Robert J. Brunner

Robert J. Brunner contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2020arXiv

Extended Isolation Forest

We present an extension to the model-free anomaly detection algorithm, Isolation Forest. This extension, named Extended Isolation Forest (EIF), resolves issues with assignment of anomaly score to given data points. We motivate the problem using heat maps for anomaly scores. These maps suffer from artifacts generated by the criteria for branching operation of the binary tree. We explain this problem in detail and demonstrate the mechanism by which it occurs visually. We then propose two different approaches for improving the situation. First we propose transforming the data randomly before creation of each tree, which results in averaging out the bias. Second, which is the preferred way, is to allow the slicing of the data to use hyperplanes with random slopes. This approach results in remedying the artifact seen in the anomaly score heat maps. We show that the robustness of the algorithm is much improved using this method by looking at the variance of scores of data points distributed along constant level sets. We report AUROC and AUPRC for our synthetic datasets, along with real-world benchmark datasets. We find no appreciable difference in the rate of convergence nor in computation time between the standard Isolation Forest and EIF.

preprint2010arXiv

Data Mining and Machine Learning in Astronomy

We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.

preprint2010arXiv

Evolution of the Clustering of Photometrically Selected SDSS Galaxies

We measure the angular auto-correlation functions (w) of SDSS galaxies selected to have photometric redshifts 0.1 < z < 0.4 and absolute r-band magnitudes Mr < -21.2. We split these galaxies into five overlapping redshift shells of width 0.1 and measure w in each subsample in order to investigate the evolution of SDSS galaxies. We find that the bias increases substantially with redshift - much more so than one would expect for a passively evolving sample. We use halo-model analysis to determine the best-fit halo-occupation-distribution (HOD) for each subsample, and the best-fit models allow us to interpret the change in bias physically. In order to properly interpret our best-fit HODs, we convert each halo mass to its z = 0 passively evolved bias (bo), enabling a direct comparison of the best-fit HODs at different redshifts. We find that the minimum halo bo required to host a galaxy decreases as the redshift decreases, suggesting that galaxies with Mr < -21.2 are forming in halos at the low-mass end of the HODs over our redshift range. We use the best-fit HODs to determine the change in occupation number divided by the change in mass of halos with constant bo and we find a sharp peak at bo ~ 0.9 - corresponding to an average halo mass of ~ 10^12Msol/h. We thus present the following scenario: the bias of galaxies with Mr < -21.2 decreases as the Universe evolves because these galaxies form in halos of mass ~ 10^12Msol/h (independent of redshift), and the bias of these halos naturally decreases as the Universe evolves.

preprint2009arXiv

A Cross-Correlation Analysis of Mg II Absorption Line Systems and Luminous Red Galaxies from the SDSS DR5

We analyze the cross-correlation of 2,705 unambiguously intervening Mg II (2796,2803A) quasar absorption line systems with 1,495,604 luminous red galaxies (LRGs) from the Fifth Data Release of the Sloan Digital Sky Survey within the redshift range 0.36<=z<=0.8. We confirm with high precision a previously reported weak anti-correlation of equivalent width and dark matter halo mass, measuring the average masses to be log M_h(M_[solar]h^-1)=11.29 [+0.36,-0.62] and log M_h(M_[solar]h^-1)=12.70 [+0.53,-1.16] for systems with W[2796A]>=1.4A and 0.8A<=W[2796A]<1.4A, respectively. Additionally, we investigate the significance of a number of potential sources of bias inherent in absorber-LRG cross-correlation measurements, including absorber velocity distributions and the weak lensing of background quasars, which we determine is capable of producing a 20-30% bias in angular cross-correlation measurements on scales less than 2&#39;. We measure the Mg II - LRG cross-correlation for 719 absorption systems with v<60,000 km s^-1 in the quasar rest frame and find that these associated absorbers typically reside in dark matter haloes that are ~10-100 times more massive than those hosting unambiguously intervening Mg II absorbers. Furthermore, we find evidence for evolution of the redshift number density, dN/dz, with 2-sigma significance for the strongest (W>2.0A) absorbers in the DR5 sample. This width-dependent dN/dz evolution does not significantly affect the recovered equivalent width-halo mass anti-correlation and adds to existing evidence that the strongest Mg II absorption systems are correlated with an evolving population of field galaxies at z<0.8, while the non-evolving dN/dz of the weakest absorbers more closely resembles that of the LRG population.

preprint2009arXiv

Eight-Dimensional Mid-Infrared/Optical Bayesian Quasar Selection

We explore the multidimensional, multiwavelength selection of quasars from mid-IR (MIR) plus optical data, specifically from Spitzer-IRAC and the Sloan Digital Sky Survey (SDSS). We apply modern statistical techniques to combined Spitzer MIR and SDSS optical data, allowing up to 8-D color selection of quasars. Using a Bayesian selection method, we catalog 5546 quasar candidates to an 8.0 um depth of 56 uJy over an area of ~24 sq. deg; ~70% of these candidates are not identified by applying the same Bayesian algorithm to 4-color SDSS optical data alone. Our selection recovers 97.7% of known type 1 quasars in this area and greatly improves the effectiveness of identifying 3.5<z<5 quasars. Even using only the two shortest wavelength IRAC bandpasses, it is possible to use our Bayesian techniques to select quasars with 97% completeness and as little as 10% contamination. This sample has a photometric redshift accuracy of 93.6% (Delta Z +/-0.3), remaining roughly constant when the two reddest MIR bands are excluded. While our methods are designed to find type 1 (unobscured) quasars, as many as 1200 of the objects are type 2 (obscured) quasar candidates. Coupling deep optical imaging data with deep mid-IR data could enable selection of quasars in significant numbers past the peak of the quasar luminosity function (QLF) to at least z~4. Such a sample would constrain the shape of the QLF and enable quasar clustering studies over the largest range of redshift and luminosity to date, yielding significant gains in our understanding of quasars and the evolution of galaxies.

preprint2009arXiv

Halo-model Analysis of the Clustering of Photometrically Selected Galaxies from SDSS

We measure the angular 2-point correlation functions of galaxies in a volume limited, photometrically selected galaxy sample from the fifth data release of the Sloan Digital Sky Survey. We split the sample both by luminosity and galaxy type and use a halo-model analysis to find halo-occupation distributions that can simultaneously model the clustering of all, early-, and late-type galaxies in a given sample. Our results for the full galaxy sample are generally consistent with previous results using the SDSS spectroscopic sample, taking the differences between the median redshifts of the photometric and spectroscopic samples into account. We find that our early- and late- type measurements cannot be fit by a model that allows early- and late-type galaxies to be well-mixed within halos. Instead, we introduce a new model that segregates early- and late-type galaxies into separate halos to the maximum allowed extent. We determine that, in all cases, it provides a good fit to our data and thus provides a new statistical description of the manner in which early- and late-type galaxies occupy halos.

preprint2008arXiv

Normalization of the Matter Power Spectrum via Higher-Order Angular Correlations of Luminous Red Galaxies

We present a novel technique to measure $σ_8$, by measuring the dependence of the second-order bias of a density field on $σ_8$ using two separate techniques. Each technique employs area-averaged angular correlation functions ($\barω_N$), one relying on the shape of $\barω_2$, the other relying on the amplitude of $s_3$ ($s_3 =\barω_3/\barω_2^2$). We confirm the validity of the method by testing it on a mock catalog drawn from Millennium Simulation data and finding $σ_8^{measured}- σ_8^{true} = -0.002 \pm 0.062$. We create a catalog of photometrically selected LRGs from SDSS DR5 and separate it into three distinct data sets by photometric redshift, with median redshifts of 0.47, 0.53, and 0.61. Measurements of $c_2$, and $σ_8$ are made for each data set, assuming flat geometry and WMAP3 best-fit priors on $Ω_m$, $h$, and $Γ$. We find, with increasing redshfit, $c_2 = 0.09 \pm 0.04$, $0.09 \pm 0.05$, and $0.09 \pm 0.03$ and $σ_8 = 0.78 \pm 0.08$, $0.80 \pm 0.09$, and $0.80 \pm 0.09$. We combine these three consistent $σ_8$ measurements to produce the result $σ_8 = 0.79 \pm 0.05$. Allowing the parameters $Ω_m$, $h$, and $Γ$ to vary within their WMAP3 1$σ$ error, we find that the best-fit $σ_8$ does not change by more than 8% and we are thus confident our measurement is accurate to within 10%. We anticipate that future surveys, such as Pan-STARRS, DES, and LSST, will be able to employ this method to measure $σ_8$ to great precision, and will serve as an important check, complementary, on the values determined via more established methods.

preprint2007arXiv

Robust Machine Learning Applied to Astronomical Datasets II: Quantifying Photometric Redshifts for Quasars Using Instance-Based Learning

We apply instance-based machine learning in the form of a k-nearest neighbor algorithm to the task of estimating photometric redshifts for 55,746 objects spectroscopically classified as quasars in the Fifth Data Release of the Sloan Digital Sky Survey. We compare the results obtained to those from an empirical color-redshift relation (CZR). In contrast to previously published results using CZRs, we find that the instance-based photometric redshifts are assigned with no regions of catastrophic failure. Remaining outliers are simply scattered about the ideal relation, in a similar manner to the pattern seen in the optical for normal galaxies at redshifts z < ~1. The instance-based algorithm is trained on a representative sample of the data and pseudo-blind-tested on the remaining unseen data. The variance between the photometric and spectroscopic redshifts is sigma^2 = 0.123 +/- 0.002 (compared to sigma^2 = 0.265 +/- 0.006 for the CZR), and 54.9 +/- 0.7%, 73.3 +/- 0.6%, and 80.7 +/- 0.3% of the objects are within delta z < 0.1, 0.2, and 0.3 respectively. We also match our sample to the Second Data Release of the Galaxy Evolution Explorer legacy data and the resulting 7,642 objects show a further improvement, giving a variance of sigma^2 = 0.054 +/- 0.005, and 70.8 +/- 1.2%, 85.8 +/- 1.0%, and 90.8 +/- 0.7% of objects within delta z < 0.1, 0.2, and 0.3. We show that the improvement is indeed due to the extra information provided by GALEX, by training on the same dataset using purely SDSS photometry, which has a variance of sigma^2 = 0.090 +/- 0.007. Each set of results represents a realistic standard for application to further datasets for which the spectra are representative.

preprint2006arXiv

Quasars Probing Quasars I: Optically Thick Absorbers Near Luminous Quasars

With close pairs of quasars at different redshifts, a background quasar sightline can be used to study a foreground quasar&#39;s environment in absorption. We search 149 moderate resolution background quasar spectra, from Gemini, Keck, the MMT, and the SDSS to survey Lyman Limit Systems (LLSs) and Damped Ly-alpha systems (DLAs) in the vicinity of 1.8 < z < 4.0 luminous foreground quasars. A sample of 27 new quasar-absorber pairs is uncovered with column densities, 17.2 < log (N_HI/cm^2) < 20.9, and transverse (proper) distances of 22 kpc/h < R < 1.7 Mpc/h, from the foreground quasars. If they emit isotropically, the implied ionizing photon fluxes are a factor of ~ 5-8000 times larger than the ambient extragalactic UV background over this range of distances. The observed probability of intercepting an absorber is very high for small separations: six out of eight projected sightlines with transverse separations R < 150 kpc/h have an absorber coincident with the foreground quasar, of which four have log N_HI > 10^19. The covering factor of log N_HI > 10^19 absorbers is thus ~ 50 % (4/8) on these small scales, whereas < 2% would have been expected at random. There are many cosmological applications of these new sightlines: they provide laboratories for studying fluorescent Ly-alpha recombination radiation from LLSs, constrain the environments, emission geometry, and radiative histories of quasars, and shed light on the physical nature of LLSs and DLAs.

preprint2005arXiv

Binary Quasars in the Sloan Digital Sky Survey: Evidence for Excess Clustering on Small Scales

We present a sample of 218 new quasar pairs with proper transverse separations R_prop < 1 Mpc/h over the redshift range 0.5 < z < 3.0, discovered from an extensive follow up campaign to find companions around the Sloan Digital Sky Survey and 2dF Quasar Redshift Survey quasars. This sample includes 26 new binary quasars with separations R_prop < 50 kpc/h (theta < 10 arcseconds), more than doubling the number of such systems known. We define a statistical sample of binaries selected with homogeneous criteria and compute its selection function, taking into account sources of incompleteness. The first measurement of the quasar correlation function on scales 10 kpc/h < R_prop < 400 kpc/h is presented. For R_prop < 40 kpc/h, we detect an order of magnitude excess clustering over the expectation from the large scale R_prop > 3 Mpc/h quasar correlation function, extrapolated down as a power law to the separations probed by our binaries. The excess grows to ~ 30 at R_prop ~ 10 kpc/h, and provides compelling evidence that the quasar autocorrelation function gets progressively steeper on sub-Mpc scales. This small scale excess can likely be attributed to dissipative interaction events which trigger quasar activity in rich environments. Recent small scale measurements of galaxy clustering and quasar-galaxy clustering are reviewed and discussed in relation to our measurement of small scale quasar clustering.