Paper detail

Failures and Fixes: A Study of Software System Incident Response

This paper presents the results of a research study related to software system failures, with the goal of understanding how we might better evolve, maintain and support software systems in production. We have qualitatively analyzed thirty incidents: fifteen collected through in depth interviews with engineers, and fifteen sampled from publicly published incident reports (generally produced as part of postmortem reviews). Our analysis focused on understanding and categorizing how failures occurred, and how they were detected, investigated and mitigated. We also captured analytic insights related to the current state of the practice and associated challenges in the form of 11 key observations. For example, we observed that failures can cascade through a system leading to major outages; and that often engineers do not understand the scaling limits of systems they are supporting until those limits are exceeded. We argue that the challenges we have identified can lead to improvements to how systems are engineered and supported.

preprint2020arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.