Unit 11 - Monitoring

Overview

In this unit, we focus on Linux system monitoring, using modern tools like Grafana, Prometheus, Node Exporter, and Loki. As Linux administrators, monitoring is essential to ensure system stability, performance, and security across environments.

We will explore how to collect, analyze, and visualize system metrics, and discuss best practices for monitoring and dashboard design that can improve troubleshooting and proactive system management.

Learning Objectives

By the end of this unit, you will be able to:

Explain core monitoring concepts like metrics, logs, SLOs, SLIs, and KPIs
Set up Prometheus and Node Exporter to collect system metrics
Use Grafana to create dashboards for visualizing system health and performance
Write and execute PromQL queries to analyze system data
Interpret monitoring data to diagnose system issues and support teams with actionable insights

Key Terms and Definitions

SLO (Service Level Objective)	SLA (Service Level Agreement)
SLI (Service Level Indicator)	KPI (Key Performance Indicator)
MTTD (Mean Time to Detect)	MTTR (Mean Time to Repair)