Source author record

Jim Gray

Jim Gray appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Digital Libraries Databases astro-ph astro-ph.CO Distributed, Parallel, and Cluster Computing

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2010arXiv

The Sloan Digital Sky Survey Quasar Catalog V. Seventh Data Release

We present the fifth edition of the Sloan Digital Sky Survey (SDSS) Quasar Catalog, which is based upon the SDSS Seventh Data Release. The catalog, which contains 105,783 spectroscopically confirmed quasars, represents the conclusion of the SDSS-I and SDSS-II quasar survey. The catalog consists of the SDSS objects that have luminosities larger than M_i = -22.0 (in a cosmology with H_0 = 70 km/s/Mpc Omega_M = 0.3, and Omega_Lambda = 0.7) have at least one emission line with FWHM larger than 1000 km/s or have interesting/complex absorption features, are fainter than i > 15.0 and have highly reliable redshifts. The catalog covers an area of 9380 deg^2. The quasar redshifts range from 0.065 to 5.46, with a median value of 1.49; the catalog includes 1248 quasars at redshifts greater than four, of which 56 are at redshifts greater than five. The catalog contains 9210 quasars with i < 18; slightly over half of the entries have i< 19. For each object the catalog presents positions accurate to better than 0.1" rms per coordinate, five-band (ugriz) CCD-based photometry with typical accuracy of 0.03 mag, and information on the morphology and selection method. The catalog also contains radio, near-infrared, and X-ray emission properties of the quasars, when available, from other large-area surveys. The calibrated digital spectra cover the wavelength region 3800-9200 Ang. at a spectral resolution R = 2000 the spectra can be retrieved from the SDSS public database using the information provided in the catalog. Over 96% of the objects in the catalog were discovered by the SDSS. We also include a supplemental list of an additional 207 quasars with SDSS spectra whose archive photometric information is incomplete.

preprint2007arXiv

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?

Application designers often face the question of whether to store large objects in a filesystem or in a database. Often this decision is made for application design simplicity. Sometimes, performance measurements are also used. This paper looks at the question of fragmentation - one of the operational issues that can affect the performance and/or manageability of the system as deployed long term. As expected from the common wisdom, objects smaller than 256KB are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256KB and 1MB, the read:write ratio and rate of object overwrite or replacement are important factors. We used the notion of "storage age" or number of object overwrites as way of normalizing wall clock time. Storage age allows our results or similar such results to be applied across a number of read:write ratios and object replacement rates.

preprint2002arXiv

Online Scientific Data Curation, Publication, and Archiving

Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to preserve the data. We observe that pub-lished scientific data needs to be available forever ? this gives rise to the data pyramid of versions and to data inflation where the derived data volumes explode. As an example, this article describes the Sloan Digital Sky Survey (SDSS) strategies for data publication, data access, curation, and preservation.

preprint2002arXiv

Web Services for the Virtual Observatory

Web Services form a new, emerging paradigm to handle distributed access to resources over the Internet. There are platform independent standards (SOAP, WSDL), which make the developers? task considerably easier. This article discusses how web services could be used in the context of the Virtual Observatory. We envisage a multi-layer architecture, with interoperating services. A well-designed lower layer consisting of simple, standard services implemented by most data providers will go a long way towards establishing a modular architecture. More complex applications can be built upon this core layer. We present two prototype applications, the SdssCutout and the SkyQuery as examples of this layered architecture.

preprint1999arXiv

Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey

The next-generation astronomy digital archives will cover most of the universe at fine resolution in many wave-lengths, from X-rays to ultraviolet, optical, and infrared. The archives will be stored at diverse geographical locations. One of the first of these projects, the Sloan Digital Sky Survey (SDSS) will create a 5-wavelength catalog over 10,000 square degrees of the sky (see http://www.sdss.org/). The 200 million objects in the multi-terabyte database will have mostly numerical attributes, defining a space of 100+ dimensions. Points in this space have highly correlated distributions. The archive will enable astronomers to explore the data interactively. Data access will be aided by a multidimensional spatial index and other indices. The data will be partitioned in many ways. Small tag objects consisting of the most popular attributes speed up frequent searches. Splitting the data among multiple servers enables parallel, scalable I/O and applies parallel processing to the data. Hashing techniques allow efficient clustering and pair-wise comparison algorithms that parallelize nicely. Randomly sampled subsets allow debugging otherwise large queries at the desktop. Central servers will operate a data pump that supports sweeping searches that touch most of the data. The anticipated queries require special operators related to angular distances and complex similarity tests of object properties, like shapes, colors, velocity vectors, or temporal behaviors. These issues pose interesting data management challenges.

preprint1999arXiv

The Sloan Digital Sky Survey and its Archive

The next-generation astronomy archives will cover most of the universe at fine resolution in many wavelengths. One of the first of these projects, the Sloan Digital Sky Survey (SDSS) will create a 5-wavelength catalog over 10,000 square degrees of the sky. The 200 million objects in the multi-terabyte database will have mostly numerical attributes, defining a space of 100+ dimensions. Points in this space have highly correlated distributions. The archive will enable astronomers to explore the data interactively. Data access will be aided by multidimensional spatial indices. The data will be partitioned in many ways. Small tag objects consisting of the most popular attributes speed up frequent searches. Splitting the data among multiple servers enables parallel, scalable I/O. Hashing techniques allow efficient clustering and pairwise comparison algorithms. Randomly sampled subsets allow debugging otherwise large queries at the desktop. Central servers will operate a data pump that supports sweeping searches that touch most of the data.

Jim Gray

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

The Sloan Digital Sky Survey Quasar Catalog V. Seventh Data Release

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?

Online Scientific Data Curation, Publication, and Archiving

Web Services for the Virtual Observatory

Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey

The Sloan Digital Sky Survey and its Archive