This 2019 SRE Report examines team structure, outages, incidents, and post-incident stress.We looked to answer the question of "What impact do incidents have on organizations and the people responding to them?" Organizations are focused on building resilient systems and recovering quickly, but does this focus extend to employee resilience and recovery from post-incident stress?The report analyzed responses from 188 SREs globally across a range of industries and company sizes.
This report provides a unique view of trends and issues facing site reliability engineers and the organizations that employ them.
64% of respondents indicate the SRE role or team has been in existence for three years or less. With the role being new, there are still many kinks to work out, like what needs to be automated to reduce toil and what service level objectives are needed.
49% of respondents indicated they had worked on an incident in the last week. Incidents are often unknown and can be difficult to prepare for, with some categorized as easy and some not. Almost 50% of respondents have worked on outages lasting more than a day at some point in their career.
Stress is a natural reaction to environmental situations. While a small percentage of SREs report never experiencing post-incident stress the majority experience stress after some or all incidents. The level of stress varies depending on the severity of the incident.
67% of SREs who feel stress after every incident do not believe their company cares about their well-being.