Pertemuan 11: Pemantauan Kinerja Sistem

Tujuan Pembelajaran

Setelah menyelesaikan praktikum ini, mahasiswa mampu:

Memahami metrik kinerja sistem yang kritis (CPU, memory, disk, network)
Menggunakan tools command-line untuk memantau kinerja sistem secara real-time
Menganalisis historical data kinerja dengan sar dan system metrics
Mengidentifikasi bottleneck dan performance issues
Mengkonfigurasi alerting dan monitoring otomatis

Teori Pendukung

Key Performance Indicators

CPU Usage

Presentase penggunaan processor (user%, system%, idle%, wait%)

Memory Utilization

Penggunaan RAM (used, free, cached, buffered) dan swap activity

Disk I/O

Read/write operations, throughput, latency, dan queue depth

Network Throughput

Bandwidth usage, packet statistics, error rates

Performance Troubleshooting

Load Average

Rata-rata beban sistem (1, 5, 15 menit) - running + waiting processes

Bottleneck Identification

Metodologi untuk mengidentifikasi sumber masalah kinerja

Capacity Planning

Perencanaan kapasitas berdasarkan trend penggunaan resource

Performance Baseline

Establishing normal performance levels untuk comparison

Performance Troubleshooting Methodology

Systematic Performance Analysis:

Identify the symptom - Response time slow? High resource usage?
Collect performance data - Use monitoring tools
Analyze and identify bottleneck - CPU, memory, disk, or network?
Implement solution - Configuration changes, optimization
Verify improvement - Measure before/after performance

Persiapan Environment dan Tools

1. Install Monitoring Tools

                        # Update sistem dan install tools monitoring lengkap

                        sudo apt update && sudo apt upgrade -y

                        sudo apt install htop iotop nmon glances sysstat dstat net-tools -y

                        # Install additional performance tools

                        sudo apt install stress-ng fio iperf3 sysbench -y

                        # Aktifkan sysstat untuk koleksi data historis

                        sudo systemctl enable sysstat

                        sudo systemctl start sysstat

                        # Buat direktori untuk hasil monitoring

                        mkdir -p /monitoring/{reports,logs,scripts}

2. Konfigurasi Sysstat

                        sudo nano /etc/default/sysstat

                        # Pastikan enabled:

                        ENABLED="true"

                        # Restart service

                        sudo systemctl restart sysstat

                        # Verifikasi koleksi data

                        ls -la /var/log/sysstat/

Real-time System Monitoring

1. Monitoring dengan top dan htop

                        # Basic top dengan auto-refresh

                        top

                        # Advanced htop dengan color dan features lengkap

                        htop

                        # Custom htop dengan sorting berbeda

                        htop -s PERCENT_CPU  # Sort by CPU usage

                        htop -s PERCENT_MEM  # Sort by memory usage

                        htop -s TIME         # Sort by process time

                        # Headless mode untuk scripting

                        top -bn1 | head -20

2. Memory Monitoring dengan free dan vmstat

                        # Check memory usage dengan human readable

                        free -h

                        # Update setiap 1 detik

                        watch -n 1 free -h

                        # Virtual memory statistics detail

                        vmstat 1 5     # Update 1 detik, 5 iterasi

                        vmstat -a 1 5   # Show active/inactive memory

                        vmstat -s       # Summary statistics

3. Disk I/O Monitoring dengan iostat dan iotop

                        # Disk statistics extended

                        iostat -x 1 3    # Extended stats, 1s interval, 3 times

                        iostat -d 1 3    # Device utilization

                        # I/O per process real-time

                        sudo iotop

                        sudo iotop -o    # Show only active I/O

                        sudo iotop -P    # Show processes only

4. Key Metrics Interpretation

Metric	Normal Range	Warning	Critical
CPU Usage	0-70%	70-85%	>85%
Memory Usage	0-80%	80-90%	>90%
Load Average	< CPU cores	1.5x CPU cores	>2x CPU cores
Disk I/O Wait	0-5%	5-20%	>20%
Swap Usage	0%	1-10%	>10%

Historical Performance Data dengan SAR

1. Analisis Data Historic CPU

                        # CPU usage hari ini

                        sar -u

                        # CPU usage tanggal tertentu

                        sar -u -f /var/log/sysstat/sa15  # Ganti dengan tanggal yang sesuai

                        # Export ke file

                        sar -u > /monitoring/reports/cpu_usage.txt

                        # CPU usage per core

                        sar -P ALL 1 3

2. Analisis Memory Historic

                        # Memory usage historical

                        sar -r

                        sar -S  # Swap usage

                        # Kombinasi multiple metrics

                        sar -ur 1 3  # Memory dan CPU setiap 1 detik

                        # Paging statistics

                        sar -B  # Paging activity

                        sar -W  # Swap statistics

3. Disk I/O Historic

                        # Disk activity historical

                        sar -d

                        sar -b  # I/O dan transfer rates

                        # Block device statistics

                        sar -dp 1 5

                        # Filesystem usage

                        sar -v  # Inode, file, other kernel tables

4. Network Statistics Historical

                        # Network interface statistics

                        sar -n DEV  # Network devices

                        sar -n EDEV # Network device errors

                        sar -n SOCK # Sockets

                        # TCP statistics

                        sar -n TCP

                        sar -n ETCP # TCP errors

Advanced Monitoring Tools

1. Monitoring dengan NMON

                        # Real-time nmon monitoring

                        nmon

                        # Shortcuts untuk berbagai metrics:

                        c # CPU

                        m # Memory

                        d # Disk

                        n # Network

                        q # Quit

                        # Capture nmon data untuk analisis

                        nmon -f -s 10 -c 30 -T -m /monitoring/logs/

                        # -f: format output, -s: interval, -c: count, -T: include top processes

2. All-in-one Monitoring dengan Glances

                        # Glances real-time monitoring

                        glances

                        # Custom plugins

                        glances --disable-plugin diskio  # Disable plugin tertentu

                        glances --enable-plugin network   # Enable plugin tertentu

                        # Glances web interface

                        glances -w

                        # Akses via browser: http://IP-ADDRESS:61208

                        # Glances API untuk automation

                        glances --export csv --export-csv-file /monitoring/reports/glances.csv

3. Network Performance Monitoring

                        # Real-time bandwidth monitoring

                        nethogs

                        iftop

                        # Interface statistics dengan sar

                        sar -n DEV 1 3

                        # Raw interface statistics

                        cat /proc/net/dev

                        ip -s link

Process-level Monitoring

1. Detailed Process Analysis

                        # Process tree visualization

                        pstree

                        pstree -p  # Dengan PID

                        # Process memory map

                        pmap $(pgrep sshd | head -1)

                        # Process limits

                        cat /proc/$(pgrep sshd | head -1)/limits

                        # Process open files

                        lsof -p $(pgrep sshd | head -1)

2. Strace untuk Debugging Process

                        # Trace system calls real-time

                        sudo strace -p $(pgrep sshd | head -1)

                        # Trace dengan summary

                        strace -c ls /

                        # Trace process execution

                        strace -f -o trace.log command

                        # Monitor specific system calls

                        strace -e trace=file command

3. Advanced Process Monitoring

                        # Process resource usage

                        ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head -10

                        # Process state information

                        ps -eo pid,state,cmd

                        # Process priority and nice values

                        ps -eo pid,ni,pri,cmd

Performance Benchmarking

1. CPU Benchmarking

                        # Stress test CPU dengan berbagai metode

                        stress-ng --cpu 4 --timeout 30s --metrics-brief

                        # CPU benchmark dengan sysbench

                        sysbench cpu --cpu-max-prime=20000 run

                        # Single thread vs multi-thread

                        sysbench cpu --threads=1 run

                        sysbench cpu --threads=4 run

2. Disk I/O Benchmark

                        # Sequential read/write test

                        fio --name=seq_read --rw=read --bs=1M --size=1G --numjobs=1 --runtime=60 --time_based

                        # Random read/write test

                        fio --name=randrw --rw=randrw --bs=4k --size=1G --numjobs=4 --runtime=60 --time_based

                        # Disk throughput test

                        dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync

                        # Disk latency test

                        dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync

3. Memory Benchmark

                        # Memory stress test

                        stress-ng --vm 2 --vm-bytes 1G --timeout 30s --metrics-brief

                        # Memory bandwidth test

                        sysbench memory --memory-total-size=2G run

                        # Cache benchmark

                        sysbench memory --memory-block-size=1K --memory-total-size=10G run

4. Network Benchmark

                        # Server side (run on remote host)

                        iperf3 -s

                        # Client side throughput test

                        iperf3 -c server_ip -t 30

                        # Bidirectional test

                        iperf3 -c server_ip -t 30 -d

                        # Multiple parallel streams

                        iperf3 -c server_ip -t 30 -P 4

Automated Monitoring Scripts

1. Bash Script untuk System Health Check

                        cat > /monitoring/scripts/health_check.sh << 'EOF'

                        #!/bin/bash

                        # System Health Check Script

                        LOG_FILE="/monitoring/logs/health_$(date +%Y%m%d).log"

                        THRESHOLD_CPU=80

                        THRESHOLD_MEM=90

                        THRESHOLD_DISK=85

                        echo "=== SYSTEM HEALTH CHECK - $(date) ===" >> $LOG_FILE

                        # CPU Check

                        CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)

                        echo "CPU Usage: $CPU_USAGE%" >> $LOG_FILE

                        if (( $(echo "$CPU_USAGE > $THRESHOLD_CPU" | bc -l) )); then

                            echo "WARNING: High CPU usage detected!" >> $LOG_FILE

                        fi

                        # Memory Check

                        MEM_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}')

                        echo "Memory Usage: $MEM_USAGE%" >> $LOG_FILE

                        if [ $MEM_USAGE -gt $THRESHOLD_MEM ]; then

                            echo "WARNING: High memory usage detected!" >> $LOG_FILE

                        fi

                        # Disk Check

                        DISK_USAGE=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')

                        echo "Disk Usage: $DISK_USAGE%" >> $LOG_FILE

                        if [ $DISK_USAGE -gt $THRESHOLD_DISK ]; then

                            echo "WARNING: High disk usage detected!" >> $LOG_FILE

                        fi

                        # Load Average

                        LOAD_AVG=$(cat /proc/loadavg | awk '{print $1,$2,$3}')

                        echo "Load Average: $LOAD_AVG" >> $LOG_FILE

                        echo "=== CHECK COMPLETED ===" >> $LOG_FILE

                        EOF

                        chmod +x /monitoring/scripts/health_check.sh

2. Cron Job untuk Automated Monitoring

                        # Setup cron job setiap 5 menit

                        crontab -l > /tmp/mycron

                        echo "*/5 * * * * /monitoring/scripts/health_check.sh" >> /tmp/mycron

                        crontab /tmp/mycron

                        # Verifikasi cron job

                        crontab -l

Alerting dan Notification

1. Script untuk Alerting Critical Conditions

                        cat > /monitoring/scripts/alert_system.sh << 'EOF'

                        #!/bin/bash

                        # Alerting Script for Critical Conditions

                        CPU_THRESHOLD=90

                        MEM_THRESHOLD=95

                        DISK_THRESHOLD=90

                        CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)

                        MEM_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}')

                        DISK_USAGE=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')

                        ALERT_MESSAGE=""

                        if (( $(echo "$CPU_USAGE > $CPU_THRESHOLD" | bc -l) )); then

                            ALERT_MESSAGE="CRITICAL: CPU usage is $CPU_USAGE% (Threshold: $CPU_THRESHOLD%)\n"

                        fi

                        if [ $MEM_USAGE -gt $MEM_THRESHOLD ]; then

                            ALERT_MESSAGE="${ALERT_MESSAGE}CRITICAL: Memory usage is $MEM_USAGE% (Threshold: $MEM_THRESHOLD%)\n"

                        fi

                        if [ $DISK_USAGE -gt $DISK_THRESHOLD ]; then

                            ALERT_MESSAGE="${ALERT_MESSAGE}CRITICAL: Disk usage is $DISK_USAGE% (Threshold: $DISK_THRESHOLD%)\n"

                        fi

                        if [ ! -z "$ALERT_MESSAGE" ]; then

                            echo -e "$ALERT_MESSAGE" | mail -s "SYSTEM ALERT: $(hostname)" admin@localhost

                            # Untuk production, ganti dengan: slack webhook, telegram bot, dll

                        fi

                        EOF

                        chmod +x /monitoring/scripts/alert_system.sh

Performance Analysis dan Reporting

1. Generate Performance Report

                        cat > /monitoring/scripts/generate_report.sh << 'EOF'

                        #!/bin/bash

                        # Performance Report Generator

                        REPORT_FILE="/monitoring/reports/performance_$(date +%Y%m%d).html"

                        cat > $REPORT_FILE << 'HTML'

                        System Performance Report

                        System Performance Report - $(hostname)

                        Generated on: $(date)

                        CPU Usage

                        $(sar -u)

                        Memory Usage

                        $(sar -r)

                        Disk I/O

                        $(sar -d)

                        Network Statistics

                        $(sar -n DEV)

                        HTML

                        echo "Report generated: $REPORT_FILE"

                        EOF

                        chmod +x /monitoring/scripts/generate_report.sh

Tugas dan Evaluasi

Apa perbedaan antara load average dan CPU usage? Kapan masing-masing menjadi penting?
Bagaimana cara mengidentifikasi apakah bottleneck ada di CPU, memory, atau disk I/O?
Tools mana yang paling efektif untuk monitoring real-time dan historical? Berikan alasannya!
Apa yang harus dilakukan ketika memory usage mendekati 100%?
Buat skenario: Server database mengalami kinerja lambat. Tulis langkah-langkah analisis performance yang sistematis!

Case Study: Performance Troubleshooting Web Server

                        #!/bin/bash

                        # Web Server Performance Analysis Script

                        echo "=== WEB SERVER PERFORMANCE ANALYSIS ==="

                        # 1. Check current load

                        echo "1. System Load:"

                        uptime

                        echo ""

                        # 2. Check top processes

                        echo "2. Top Processes by CPU:"

                        ps aux --sort=-%cpu | head -10

                        echo ""

                        # 3. Check memory usage

                        echo "3. Memory Usage:"

                        free -h

                        echo ""

                        # 4. Check disk I/O

                        echo "4. Disk I/O:"

                        iostat -x 1 3

                        echo ""

                        # 5. Check network connections

                        echo "5. Web Server Connections:"

                        netstat -an | grep :80 | wc -l

                        echo ""

                        # 6. Check Apache/Nginx status

                        echo "6. Web Server Status:"

                        sudo systemctl status apache2 || sudo systemctl status nginx

                        echo ""

                        echo "=== ANALYSIS COMPLETE ==="