Researcher profile

Jim Gray

Jim Gray contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2010arXiv

The Sloan Digital Sky Survey Quasar Catalog V. Seventh Data Release

We present the fifth edition of the Sloan Digital Sky Survey (SDSS) Quasar Catalog, which is based upon the SDSS Seventh Data Release. The catalog, which contains 105,783 spectroscopically confirmed quasars, represents the conclusion of the SDSS-I and SDSS-II quasar survey. The catalog consists of the SDSS objects that have luminosities larger than M_i = -22.0 (in a cosmology with H_0 = 70 km/s/Mpc Omega_M = 0.3, and Omega_Lambda = 0.7) have at least one emission line with FWHM larger than 1000 km/s or have interesting/complex absorption features, are fainter than i > 15.0 and have highly reliable redshifts. The catalog covers an area of 9380 deg^2. The quasar redshifts range from 0.065 to 5.46, with a median value of 1.49; the catalog includes 1248 quasars at redshifts greater than four, of which 56 are at redshifts greater than five. The catalog contains 9210 quasars with i < 18; slightly over half of the entries have i< 19. For each object the catalog presents positions accurate to better than 0.1&#34; rms per coordinate, five-band (ugriz) CCD-based photometry with typical accuracy of 0.03 mag, and information on the morphology and selection method. The catalog also contains radio, near-infrared, and X-ray emission properties of the quasars, when available, from other large-area surveys. The calibrated digital spectra cover the wavelength region 3800-9200 Ang. at a spectral resolution R = 2000 the spectra can be retrieved from the SDSS public database using the information provided in the catalog. Over 96% of the objects in the catalog were discovered by the SDSS. We also include a supplemental list of an additional 207 quasars with SDSS spectra whose archive photometric information is incomplete.

preprint2007arXiv

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?

Application designers often face the question of whether to store large objects in a filesystem or in a database. Often this decision is made for application design simplicity. Sometimes, performance measurements are also used. This paper looks at the question of fragmentation - one of the operational issues that can affect the performance and/or manageability of the system as deployed long term. As expected from the common wisdom, objects smaller than 256KB are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256KB and 1MB, the read:write ratio and rate of object overwrite or replacement are important factors. We used the notion of &#34;storage age&#34; or number of object overwrites as way of normalizing wall clock time. Storage age allows our results or similar such results to be applied across a number of read:write ratios and object replacement rates.

preprint2002arXiv

Online Scientific Data Curation, Publication, and Archiving

Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to preserve the data. We observe that pub-lished scientific data needs to be available forever ? this gives rise to the data pyramid of versions and to data inflation where the derived data volumes explode. As an example, this article describes the Sloan Digital Sky Survey (SDSS) strategies for data publication, data access, curation, and preservation.

preprint2002arXiv

Web Services for the Virtual Observatory

Web Services form a new, emerging paradigm to handle distributed access to resources over the Internet. There are platform independent standards (SOAP, WSDL), which make the developers? task considerably easier. This article discusses how web services could be used in the context of the Virtual Observatory. We envisage a multi-layer architecture, with interoperating services. A well-designed lower layer consisting of simple, standard services implemented by most data providers will go a long way towards establishing a modular architecture. More complex applications can be built upon this core layer. We present two prototype applications, the SdssCutout and the SkyQuery as examples of this layered architecture.

preprint1999arXiv

Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey

The next-generation astronomy digital archives will cover most of the universe at fine resolution in many wave-lengths, from X-rays to ultraviolet, optical, and infrared. The archives will be stored at diverse geographical locations. One of the first of these projects, the Sloan Digital Sky Survey (SDSS) will create a 5-wavelength catalog over 10,000 square degrees of the sky (see http://www.sdss.org/). The 200 million objects in the multi-terabyte database will have mostly numerical attributes, defining a space of 100+ dimensions. Points in this space have highly correlated distributions. The archive will enable astronomers to explore the data interactively. Data access will be aided by a multidimensional spatial index and other indices. The data will be partitioned in many ways. Small tag objects consisting of the most popular attributes speed up frequent searches. Splitting the data among multiple servers enables parallel, scalable I/O and applies parallel processing to the data. Hashing techniques allow efficient clustering and pair-wise comparison algorithms that parallelize nicely. Randomly sampled subsets allow debugging otherwise large queries at the desktop. Central servers will operate a data pump that supports sweeping searches that touch most of the data. The anticipated queries require special operators related to angular distances and complex similarity tests of object properties, like shapes, colors, velocity vectors, or temporal behaviors. These issues pose interesting data management challenges.

preprint1999arXiv

The Sloan Digital Sky Survey and its Archive

The next-generation astronomy archives will cover most of the universe at fine resolution in many wavelengths. One of the first of these projects, the Sloan Digital Sky Survey (SDSS) will create a 5-wavelength catalog over 10,000 square degrees of the sky. The 200 million objects in the multi-terabyte database will have mostly numerical attributes, defining a space of 100+ dimensions. Points in this space have highly correlated distributions. The archive will enable astronomers to explore the data interactively. Data access will be aided by multidimensional spatial indices. The data will be partitioned in many ways. Small tag objects consisting of the most popular attributes speed up frequent searches. Splitting the data among multiple servers enables parallel, scalable I/O. Hashing techniques allow efficient clustering and pairwise comparison algorithms. Randomly sampled subsets allow debugging otherwise large queries at the desktop. Central servers will operate a data pump that supports sweeping searches that touch most of the data.