Instructions
Fill out the worksheet as you progress through the lab and discussions. Hold your worksheets until the end to turn them in as a final submission packet.
Resources / Important Links
- Google SRE Book - Implementing SLOs
- AWS High Availability Architecture Guide
- Red Hat High Availability Cluster Configuration
Downloads
The worksheet has been provided below. The document can be transposed to
the desired format so long as the content is preserved. For example, the .txt
could be transposed to a .md
file.
Unit 3 Recording
Discussion Post #1
Scan the chapter here for keywords and pull out what you think will help you to better understand how to triage an incident.
Read the section called "Operation Security" in this same chapter: Building Secure and Reliable Systems
- What important concepts do you learn about how we behave during an operational response to an incident?
Discussion Post #2
Ask Google, find a blog, or ask an AI about high availability. (Here's one if you need it: AWS Real-Time Communication Whitepaper
- What are some important terms you read about? Why do you think understanding HA will help you better in the context of triaging incidents?
The discussion posts are done in Discord threads. Click the 'Threads' icon on the top right and search for the discussion post.
Definitions
Five 9's:
Single Point of Failure (SPOF):
Key Performance Indicators (KPIs):
Service Level Indicator (SLI):
Service Level Objective (SLO):
Service Level Agreement (SLA):
Active-Standby:
Active-Active:
Mean Time to Detect (MTTD):
Mean Time to Recover/Restore (MTTR):
Mean Time Between Failures (MTBF):
Digging Deeper
-
If uptime is so important to us, why is it so important to us to also understand how our systems can fail? Why would we focus on the thing that does not drive uptime?
-
Start reading about SLOs: Implementing SLOs How does this help you operationally? Does it make sense that keeping systems within defined parameters will help keep them operating longer?
Reflection Questions
-
What questions do you still have about this week?
-
How are you going to use what you've learned in your current role?