Source author record

Sarah Edge Mann

Sarah Edge Mann appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

1works
2topics
2close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

1 published item(s)

preprint2012arXiv

On the Reliability of RAID Systems: An Argument for More Check Drives

In this paper we address issues of reliability of RAID systems. We focus on "big data" systems with a large number of drives and advanced error correction schemes beyond \RAID{6}. Our RAID paradigm is based on Reed-Solomon codes, and thus we assume that the RAID consists of $N$ data drives and $M$ check drives. The RAID fails only if the combined number of failed drives and sector errors exceeds $M$, a property of Reed-Solomon codes. We review a number of models considered in the literature and build upon them to construct models usable for a large number of data and check drives. We attempt to account for a significant number of factors that affect RAID reliability, such as drive replacement or lack thereof, mistakes during service such as replacing the wrong drive, delayed repair, and the finite duration of RAID reconstruction. We evaluate the impact of sector failures that do not result in drive replacement. The reader who needs to consider large $M$ and $N$ will find applicable mathematical techniques concisely summarized here, and should be able to apply them to similar problems. Most methods are based on the theory of continuous time Markov chains, but we move beyond this framework when we consider the fixed time to rebuild broken hard drives, which we model using systems of delay and partial differential equations. One universal statement is applicable across various models: increasing the number of check drives in all cases increases the reliability of the system, and is vastly superior to other approaches of ensuring reliability such as mirroring.