Unit 11 - Monitoring
Overview
In this unit, we focus on Linux system monitoring, using modern tools like Grafana, Prometheus, Node Exporter, and Loki. As Linux administrators, monitoring is essential to ensure system stability, performance, and security across environments.
We will explore how to collect, analyze, and visualize system metrics, and discuss best practices for monitoring and dashboard design that can improve troubleshooting and proactive system management.
Learning Objectives
By the end of this unit, you will be able to:
- Explain core monitoring concepts like metrics, logs, SLOs, SLIs, and KPIs
- Set up Prometheus and Node Exporter to collect system metrics
- Use Grafana to create dashboards for visualizing system health and performance
- Write and execute PromQL queries to analyze system data
- Interpret monitoring data to diagnose system issues and support teams with actionable insights
Key Terms and Definitions
| SLO (Service Level Objective) | SLA (Service Level Agreement) |
|---|---|
| SLI (Service Level Indicator) | KPI (Key Performance Indicator) |
| MTTD (Mean Time to Detect) | MTTR (Mean Time to Repair) |