SREcon Conversations Europe/Middle East/Africa with Štěpán Davidovič, Google Inc.
Alerting Patterns against Toil
Štěpán Davidovič, Google Inc.
Nobody likes to get paged, and especially not needlessly. But nobody likes to write an incident retrospective due to missing alerts. As we focus more on alerting on user–perceived behavior (that is, SLO alerting), can we take it too far? We'll take a look at patterns and problems in managing our alerting.
Štěpán Davidovič is an SRE at Google. He currently works on internal infrastructure for automatic monitoring and alerting. In previous Google roles, he developed Canary Analysis Service, maintained an internal Cron system and was oncall for AdSense. He obtained his bachelor's degree from Czech Technical University in Prague.
Видео SREcon Conversations Europe/Middle East/Africa with Štěpán Davidovič, Google Inc. канала USENIX
Štěpán Davidovič, Google Inc.
Nobody likes to get paged, and especially not needlessly. But nobody likes to write an incident retrospective due to missing alerts. As we focus more on alerting on user–perceived behavior (that is, SLO alerting), can we take it too far? We'll take a look at patterns and problems in managing our alerting.
Štěpán Davidovič is an SRE at Google. He currently works on internal infrastructure for automatic monitoring and alerting. In previous Google roles, he developed Canary Analysis Service, maintained an internal Cron system and was oncall for AdSense. He obtained his bachelor's degree from Czech Technical University in Prague.
Видео SREcon Conversations Europe/Middle East/Africa with Štěpán Davidovič, Google Inc. канала USENIX
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
LISA16 - Building a Billion User Load BalancerUSENIX Security '20 - Everything Old is New Again: Binary Security of WebAssemblyOSDI '21 - Marius: Learning Massive Graph Embeddings on a Single MachineNSDI '22 - SCALE: Automatically Finding RFC Compliance Bugs in DNS NameserversSREcon22 Asia/Pacific - Principles of Safety and Reliability Learned from US Navy Landing Signal...NSDI '21 - One Protocol to Rule Them All: Wireless Network-on-Chip using Deep Reinforcement LearningUSENIX ATC '21 - FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes...USENIX Security '22 - Under the Hood of DANE Mismanagement in SMTPSREcon19 Europe/Middle East/Africa - Building Resilience: How to Learn More from IncidentsLISA21 - Can Infrastructure as Code Apply to Bare Metal?NSDI '21 - Ownership: A Distributed Futures System for Fine-Grained TasksUSENIX Security '19 - Small World with High Risks: A Study of Security Threats in the npm EcosystemUSENIX ATC '19 - Evaluating File System Reliability on Solid State DrivesUSENIX Security '20 - Timeless Timing Attacks: Exploiting Concurrency to Leak Secrets over RemoteUSENIX Security '20 - Datalog DisassemblyUSENIX Security '22 - Poison Forensics: Traceback of Data Poisoning Attacks in Neural NetworksUSENIX Security '22 - Lumos: Identifying and Localizing Diverse Hidden IoT Devices...NSDI '22 - Runtime Programmable SwitchesUSENIX Security '21 - Injection Attacks Reloaded: Tunnelling Malicious Payloads over DNSLISA16 - Network-Based LUKS Volume Decryption with TangNSDI '22 - MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU...