Pertemuan 11

Pemantauan Kinerja Sistem

Monitoring metrik kinerja, analisis bottleneck, dan optimasi resource sistem

Tujuan Pembelajaran

Setelah menyelesaikan praktikum ini, mahasiswa mampu:

  • Memahami metrik kinerja sistem yang kritis (CPU, memory, disk, network)
  • Menggunakan tools command-line untuk memantau kinerja sistem secara real-time
  • Menganalisis historical data kinerja dengan sar dan system metrics
  • Mengidentifikasi bottleneck dan performance issues
  • Mengkonfigurasi alerting dan monitoring otomatis

Teori Pendukung

Key Performance Indicators
CPU Usage

Presentase penggunaan processor (user%, system%, idle%, wait%)

Memory Utilization

Penggunaan RAM (used, free, cached, buffered) dan swap activity

Disk I/O

Read/write operations, throughput, latency, dan queue depth

Network Throughput

Bandwidth usage, packet statistics, error rates

Performance Troubleshooting
Load Average

Rata-rata beban sistem (1, 5, 15 menit) - running + waiting processes

Bottleneck Identification

Metodologi untuk mengidentifikasi sumber masalah kinerja

Capacity Planning

Perencanaan kapasitas berdasarkan trend penggunaan resource

Performance Baseline

Establishing normal performance levels untuk comparison

Performance Troubleshooting Methodology
Systematic Performance Analysis:
  1. Identify the symptom - Response time slow? High resource usage?
  2. Collect performance data - Use monitoring tools
  3. Analyze and identify bottleneck - CPU, memory, disk, or network?
  4. Implement solution - Configuration changes, optimization
  5. Verify improvement - Measure before/after performance

Persiapan Environment dan Tools

1. Install Monitoring Tools
# Update sistem dan install tools monitoring lengkap
sudo apt update && sudo apt upgrade -y
sudo apt install htop iotop nmon glances sysstat dstat net-tools -y

# Install additional performance tools
sudo apt install stress-ng fio iperf3 sysbench -y

# Aktifkan sysstat untuk koleksi data historis
sudo systemctl enable sysstat
sudo systemctl start sysstat

# Buat direktori untuk hasil monitoring
mkdir -p /monitoring/{reports,logs,scripts}
2. Konfigurasi Sysstat
sudo nano /etc/default/sysstat

# Pastikan enabled:
ENABLED="true"

# Restart service
sudo systemctl restart sysstat

# Verifikasi koleksi data
ls -la /var/log/sysstat/

Real-time System Monitoring

1. Monitoring dengan top dan htop
# Basic top dengan auto-refresh
top

# Advanced htop dengan color dan features lengkap
htop

# Custom htop dengan sorting berbeda
htop -s PERCENT_CPU # Sort by CPU usage
htop -s PERCENT_MEM # Sort by memory usage
htop -s TIME # Sort by process time

# Headless mode untuk scripting
top -bn1 | head -20
2. Memory Monitoring dengan free dan vmstat
# Check memory usage dengan human readable
free -h

# Update setiap 1 detik
watch -n 1 free -h

# Virtual memory statistics detail
vmstat 1 5 # Update 1 detik, 5 iterasi
vmstat -a 1 5 # Show active/inactive memory
vmstat -s # Summary statistics
3. Disk I/O Monitoring dengan iostat dan iotop
# Disk statistics extended
iostat -x 1 3 # Extended stats, 1s interval, 3 times
iostat -d 1 3 # Device utilization

# I/O per process real-time
sudo iotop
sudo iotop -o # Show only active I/O
sudo iotop -P # Show processes only
4. Key Metrics Interpretation
Metric Normal Range Warning Critical
CPU Usage 0-70% 70-85% >85%
Memory Usage 0-80% 80-90% >90%
Load Average < CPU cores 1.5x CPU cores >2x CPU cores
Disk I/O Wait 0-5% 5-20% >20%
Swap Usage 0% 1-10% >10%

Historical Performance Data dengan SAR

1. Analisis Data Historic CPU
# CPU usage hari ini
sar -u

# CPU usage tanggal tertentu
sar -u -f /var/log/sysstat/sa15 # Ganti dengan tanggal yang sesuai

# Export ke file
sar -u > /monitoring/reports/cpu_usage.txt

# CPU usage per core
sar -P ALL 1 3
2. Analisis Memory Historic
# Memory usage historical
sar -r
sar -S # Swap usage

# Kombinasi multiple metrics
sar -ur 1 3 # Memory dan CPU setiap 1 detik

# Paging statistics
sar -B # Paging activity
sar -W # Swap statistics
3. Disk I/O Historic
# Disk activity historical
sar -d
sar -b # I/O dan transfer rates

# Block device statistics
sar -dp 1 5

# Filesystem usage
sar -v # Inode, file, other kernel tables
4. Network Statistics Historical
# Network interface statistics
sar -n DEV # Network devices
sar -n EDEV # Network device errors
sar -n SOCK # Sockets

# TCP statistics
sar -n TCP
sar -n ETCP # TCP errors

Advanced Monitoring Tools

1. Monitoring dengan NMON
# Real-time nmon monitoring
nmon

# Shortcuts untuk berbagai metrics:
c # CPU
m # Memory
d # Disk
n # Network
q # Quit

# Capture nmon data untuk analisis
nmon -f -s 10 -c 30 -T -m /monitoring/logs/
# -f: format output, -s: interval, -c: count, -T: include top processes
2. All-in-one Monitoring dengan Glances
# Glances real-time monitoring
glances

# Custom plugins
glances --disable-plugin diskio # Disable plugin tertentu
glances --enable-plugin network # Enable plugin tertentu

# Glances web interface
glances -w
# Akses via browser: http://IP-ADDRESS:61208

# Glances API untuk automation
glances --export csv --export-csv-file /monitoring/reports/glances.csv
3. Network Performance Monitoring
# Real-time bandwidth monitoring
nethogs
iftop

# Interface statistics dengan sar
sar -n DEV 1 3

# Raw interface statistics
cat /proc/net/dev
ip -s link

Process-level Monitoring

1. Detailed Process Analysis
# Process tree visualization
pstree
pstree -p # Dengan PID

# Process memory map
pmap $(pgrep sshd | head -1)

# Process limits
cat /proc/$(pgrep sshd | head -1)/limits

# Process open files
lsof -p $(pgrep sshd | head -1)
2. Strace untuk Debugging Process
# Trace system calls real-time
sudo strace -p $(pgrep sshd | head -1)

# Trace dengan summary
strace -c ls /

# Trace process execution
strace -f -o trace.log command

# Monitor specific system calls
strace -e trace=file command
3. Advanced Process Monitoring
# Process resource usage
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head -10

# Process state information
ps -eo pid,state,cmd

# Process priority and nice values
ps -eo pid,ni,pri,cmd

Performance Benchmarking

1. CPU Benchmarking
# Stress test CPU dengan berbagai metode
stress-ng --cpu 4 --timeout 30s --metrics-brief

# CPU benchmark dengan sysbench
sysbench cpu --cpu-max-prime=20000 run

# Single thread vs multi-thread
sysbench cpu --threads=1 run
sysbench cpu --threads=4 run
2. Disk I/O Benchmark
# Sequential read/write test
fio --name=seq_read --rw=read --bs=1M --size=1G --numjobs=1 --runtime=60 --time_based

# Random read/write test
fio --name=randrw --rw=randrw --bs=4k --size=1G --numjobs=4 --runtime=60 --time_based

# Disk throughput test
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync

# Disk latency test
dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync
3. Memory Benchmark
# Memory stress test
stress-ng --vm 2 --vm-bytes 1G --timeout 30s --metrics-brief

# Memory bandwidth test
sysbench memory --memory-total-size=2G run

# Cache benchmark
sysbench memory --memory-block-size=1K --memory-total-size=10G run
4. Network Benchmark
# Server side (run on remote host)
iperf3 -s

# Client side throughput test
iperf3 -c server_ip -t 30

# Bidirectional test
iperf3 -c server_ip -t 30 -d

# Multiple parallel streams
iperf3 -c server_ip -t 30 -P 4

Automated Monitoring Scripts

1. Bash Script untuk System Health Check
cat > /monitoring/scripts/health_check.sh << 'EOF'

#!/bin/bash
# System Health Check Script

LOG_FILE="/monitoring/logs/health_$(date +%Y%m%d).log"
THRESHOLD_CPU=80
THRESHOLD_MEM=90
THRESHOLD_DISK=85

echo "=== SYSTEM HEALTH CHECK - $(date) ===" >> $LOG_FILE

# CPU Check
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
echo "CPU Usage: $CPU_USAGE%" >> $LOG_FILE

if (( $(echo "$CPU_USAGE > $THRESHOLD_CPU" | bc -l) )); then
echo "WARNING: High CPU usage detected!" >> $LOG_FILE
fi

# Memory Check
MEM_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}')
echo "Memory Usage: $MEM_USAGE%" >> $LOG_FILE

if [ $MEM_USAGE -gt $THRESHOLD_MEM ]; then
echo "WARNING: High memory usage detected!" >> $LOG_FILE
fi

# Disk Check
DISK_USAGE=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')
echo "Disk Usage: $DISK_USAGE%" >> $LOG_FILE

if [ $DISK_USAGE -gt $THRESHOLD_DISK ]; then
echo "WARNING: High disk usage detected!" >> $LOG_FILE
fi

# Load Average
LOAD_AVG=$(cat /proc/loadavg | awk '{print $1,$2,$3}')
echo "Load Average: $LOAD_AVG" >> $LOG_FILE

echo "=== CHECK COMPLETED ===" >> $LOG_FILE
EOF

chmod +x /monitoring/scripts/health_check.sh
2. Cron Job untuk Automated Monitoring
# Setup cron job setiap 5 menit
crontab -l > /tmp/mycron
echo "*/5 * * * * /monitoring/scripts/health_check.sh" >> /tmp/mycron
crontab /tmp/mycron

# Verifikasi cron job
crontab -l

Alerting dan Notification

1. Script untuk Alerting Critical Conditions
cat > /monitoring/scripts/alert_system.sh << 'EOF'

#!/bin/bash
# Alerting Script for Critical Conditions

CPU_THRESHOLD=90
MEM_THRESHOLD=95
DISK_THRESHOLD=90

CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
MEM_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}')
DISK_USAGE=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')

ALERT_MESSAGE=""

if (( $(echo "$CPU_USAGE > $CPU_THRESHOLD" | bc -l) )); then
ALERT_MESSAGE="CRITICAL: CPU usage is $CPU_USAGE% (Threshold: $CPU_THRESHOLD%)\n"
fi

if [ $MEM_USAGE -gt $MEM_THRESHOLD ]; then
ALERT_MESSAGE="${ALERT_MESSAGE}CRITICAL: Memory usage is $MEM_USAGE% (Threshold: $MEM_THRESHOLD%)\n"
fi

if [ $DISK_USAGE -gt $DISK_THRESHOLD ]; then
ALERT_MESSAGE="${ALERT_MESSAGE}CRITICAL: Disk usage is $DISK_USAGE% (Threshold: $DISK_THRESHOLD%)\n"
fi

if [ ! -z "$ALERT_MESSAGE" ]; then
echo -e "$ALERT_MESSAGE" | mail -s "SYSTEM ALERT: $(hostname)" admin@localhost
# Untuk production, ganti dengan: slack webhook, telegram bot, dll
fi
EOF

chmod +x /monitoring/scripts/alert_system.sh

Performance Analysis dan Reporting

1. Generate Performance Report
cat > /monitoring/scripts/generate_report.sh << 'EOF'

#!/bin/bash
# Performance Report Generator

REPORT_FILE="/monitoring/reports/performance_$(date +%Y%m%d).html"

cat > $REPORT_FILE << 'HTML'

System Performance Report

System Performance Report - $(hostname)



Generated on: $(date)



CPU Usage


$(sar -u)


Memory Usage


$(sar -r)


Disk I/O


$(sar -d)


Network Statistics


$(sar -n DEV)




HTML

echo "Report generated: $REPORT_FILE"
EOF

chmod +x /monitoring/scripts/generate_report.sh

Tugas dan Evaluasi

  1. Apa perbedaan antara load average dan CPU usage? Kapan masing-masing menjadi penting?
  2. Bagaimana cara mengidentifikasi apakah bottleneck ada di CPU, memory, atau disk I/O?
  3. Tools mana yang paling efektif untuk monitoring real-time dan historical? Berikan alasannya!
  4. Apa yang harus dilakukan ketika memory usage mendekati 100%?
  5. Buat skenario: Server database mengalami kinerja lambat. Tulis langkah-langkah analisis performance yang sistematis!

Case Study: Performance Troubleshooting Web Server

#!/bin/bash
# Web Server Performance Analysis Script

echo "=== WEB SERVER PERFORMANCE ANALYSIS ==="

# 1. Check current load
echo "1. System Load:"
uptime
echo ""

# 2. Check top processes
echo "2. Top Processes by CPU:"
ps aux --sort=-%cpu | head -10
echo ""

# 3. Check memory usage
echo "3. Memory Usage:"
free -h
echo ""

# 4. Check disk I/O
echo "4. Disk I/O:"
iostat -x 1 3
echo ""

# 5. Check network connections
echo "5. Web Server Connections:"
netstat -an | grep :80 | wc -l
echo ""

# 6. Check Apache/Nginx status
echo "6. Web Server Status:"
sudo systemctl status apache2 || sudo systemctl status nginx
echo ""

echo "=== ANALYSIS COMPLETE ==="