- Title
- Fighting the Fog of War: Automated Incident Detection for Cloud Systems
- Creator
- Li, Liqun; Zhang, Xu; Gao, Feng; Yang, Li; Lin, Qingwei; Rajmohan, Saravanakumar; Xu, Zhangwei; Zhang, Dongmei; Zhao, Xin; Zhang, Hongyu; Kang, Yu; Zhao, Pu; Qiao, Bo; He, Shilin; Lee, Pochian; Sun, Jeffrey
- Relation
- 2021 USENIX Annual Technical Conference, ATC 2021. Proceedings of the USENIX Annual Technical Conference / 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI) ( 14-16 July, 2021) p. 489-502
- Relation
- ARC.DP200102940 http://purl.org/au-research/grants/arc/DP200102940
- Publisher
- USENIX Association
- Resource Type
- conference paper
- Date
- 2021
- Description
- Incidents and outages dramatically degrade the availability of large-scale cloud computing systems such as AWS, Azure, and GCP. In current incident response practice, each team has only a partial view of the entire system, which makes the detection of incidents like fighting in the "fog of war". As a result, prolonged mitigation time and more financial loss are incurred. In this work, we propose an automatic incident detection system, namely Warden, as a part of the Incident Management (IcM) platform. Warden collects alerts from different services and detects the occurrence of incidents from a global perspective. For each detected potential incident, Warden notifies relevant on-call engineers so that they could properly prioritize their tasks and initiate cross-team collaboration. We implemented and deployed Warden in the IcM platform of Azure. Our evaluation results based on data collected in an 18-month period from 26 major services show that Warden is effective and outperforms the baseline methods. For the majority of successfully detected incidents (~ 68%), Warden is faster than human, and this is particularly the case for the incidents that take long time to detect manually.
- Subject
- automatic incident detection; different services; evaluation results; global perspective; incident detection; losses
- Identifier
- http://hdl.handle.net/1959.13/1467859
- Identifier
- uon:47921
- Identifier
- ISBN:9781939133236
- Language
- eng
- Reviewed
- Hits: 2536
- Visitors: 2516
- Downloads: 0