Tujuan Pembelajaran
Setelah menyelesaikan praktikum ini, mahasiswa mampu:
- Memahami metodologi troubleshooting sistem Linux yang sistematis
- Mengidentifikasi dan menganalisis masalah pada berbagai layer sistem
- Menggunakan tools diagnostik untuk troubleshooting
- Melakukan perbaikan masalah pada boot process, filesystem, network, dan service
- Membuat dokumentasi troubleshooting yang komprehensif
Teori Pendukung
1. Identify
Kenali gejala dan kumpulkan informasi awal
2. Analyze
Analisis penyebab potensial berdasarkan evidence
3. Plan
Rencanakan tindakan perbaikan yang tepat
4. Implement
Lakukan perbaikan dengan hati-hati
5. Verify
Verifikasi bahwa masalah teratasi
6. Document
Dokumentasikan proses dan hasil
Boot Issues
GRUB errors, kernel panic, filesystem corruption
Performance Issues
High load, memory leaks, I/O bottlenecks
Network Issues
Connectivity, DNS, firewall rules, routing
Service Issues
Service crashes, configuration errors, dependencies
Security Issues
Unauthorized access, malware, misconfigurations
Troubleshooting Priority Matrix
| Impact |
High Priority |
Medium Priority |
Low Priority |
| High |
System down, data loss |
Performance degradation |
Minor service issues |
| Medium |
Critical service outage |
Partial functionality loss |
Cosmetic issues |
| Low |
Security breaches |
Feature limitations |
Documentation updates |
Persiapan Environment Troubleshooting
1. Setup Direktori dan Install Tools
sudo mkdir -p /troubleshooting/{logs,scripts,backups}
sudo chmod 755 /troubleshooting
sudo apt update && sudo apt install -y \
sysstat dstat nmon htop iotop iftop nethogs \
strace ltrace lsof tcpdump wireshark-cli \
auditd fail2ban net-tools iproute2
2. Backup Konfigurasi Sistem
sudo cp /etc/fstab /troubleshooting/backups/fstab.backup
sudo cp /etc/hosts /troubleshooting/backups/hosts.backup
sudo cp /etc/ssh/sshd_config /troubleshooting/backups/sshd_config.backup
sudo cp /etc/network/interfaces /troubleshooting/backups/interfaces.backup
dpkg --get-selections > /troubleshooting/backups/package_list.txt
Troubleshooting Boot Process
1. Simulasi Masalah Boot
sudo cp /boot/grub/grub.cfg /troubleshooting/backups/grub.cfg.backup
# sudo mv /boot/grub/grub.cfg /boot/grub/grub.cfg.corrupt
sudo sed -i 's/quiet/quiet broken_param/' /etc/default/grub
sudo update-grub
2. Pemulihan GRUB
# Mount partisi root dan boot
sudo mount /dev/sda1 /mnt
sudo mount /dev/sda2 /mnt/boot
sudo mount --bind /dev /mnt/dev
sudo mount --bind /proc /mnt/proc
sudo mount --bind /sys /mnt/sys
sudo chroot /mnt
grub-install /dev/sda
update-grub
exit
sudo reboot
3. Troubleshooting Kernel Panic
dmesg | grep -i "error\|panic\|fail"
journalctl -k --since="1 hour ago"
lspci -v
lsusb -v
dmidecode -t memory
cat /proc/cmdline
sysctl -a | grep panic
4. Boot Process Stages dan Troubleshooting
| Stage |
Symptoms |
Tools |
Solutions |
| BIOS/UEFI |
No display, beep codes |
Hardware diagnostics |
Check cables, RAM, CPU |
| Bootloader |
GRUB rescue prompt |
Live USB, chroot |
GRUB reinstall, config fix |
| Kernel |
Kernel panic, hang |
dmesg, journalctl |
Kernel parameters, drivers |
| Initramfs |
Init failures, module errors |
initrd debugging |
Rebuild initramfs |
| Systemd |
Service failures, target issues |
systemctl, journalctl |
Service fixes, dependencies |
Troubleshooting Filesystem Issues
1. Simulasi Filesystem Corruption
sudo dd if=/dev/zero of=/tmp/testfs.img bs=1M count=100
sudo mkfs.ext4 /tmp/testfs.img
sudo mount /tmp/testfs.img /mnt/test
sudo umount /mnt/test
sudo fsck.ext4 -f /tmp/testfs.img
2. Filesystem Repair
sudo fsck /dev/sda1
sudo fsck -y /dev/sda1
sudo e2fsck -f -y /dev/sda1
lsblk -f
blkid
mount | grep /dev/sda1
3. Disk Space Issues
sudo find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null | head -20
sudo du -sh /* 2>/dev/null | sort -hr | head -10
sudo apt autoremove -y
sudo apt autoclean -y
sudo journalctl --vacuum-time=7d
df -i
find / -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
4. Filesystem Error Symptoms dan Solutions
| Symptom |
Possible Cause |
Diagnosis Command |
Solution |
| "Read-only filesystem" |
Filesystem errors, hardware issues |
dmesg | grep error, smartctl |
fsck, remount rw, check disk |
| "No space left on device" |
Disk full, inode exhaustion |
df -h, df -i, du -sh |
Cleanup files, resize partition |
| "Input/output error" |
Disk failure, cable issues |
dmesg, smartctl, badblocks |
Replace disk, check connections |
| "Stale file handle" |
NFS issues, deleted files |
showmount, lsof +L1 |
Umount/remount, clear handles |
Troubleshooting Performance Issues
1. High CPU Usage Investigation
top -b -n 1 | head -20
ps aux --sort=-%cpu | head -10
sudo strace -p $(pgrep -f "process_name") -c
sudo perf top -p $(pgrep -f "process_name")
uptime
cat /proc/loadavg
mpstat -P ALL 1 3
2. Memory Leak Detection
free -h
cat /proc/meminfo
vmstat 1 5
pmap -x $(pgrep -f "process_name")
valgrind --leak-check=yes ./application
sudo /usr/lib/linux-tools/*/perf mem record ./application
sudo slabtop -o
cat /proc/slabinfo | head -20
3. I/O Bottleneck Analysis
iotop -o
iostat -x 1 3
sudo lsof +D /var/log
sudo fuser -v /dev/sda1
sudo ioping -c 10 /
cat /sys/block/sda/queue/nr_requests
cat /sys/block/sda/queue/scheduler
4. Performance Issue Patterns
| Pattern |
Likely Cause |
Diagnosis Tools |
Resolution |
| High CPU, low I/O wait |
CPU-bound application |
top, perf, strace |
Optimize code, scale horizontally |
| High I/O wait, low CPU |
Disk bottleneck |
iostat, iotop, fio |
Upgrade storage, optimize I/O |
| High memory, swapping |
Memory leak/insufficient RAM |
free, vmstat, valgrind |
Add RAM, fix memory leaks |
| High load, low resource usage |
Process contention, locks |
strace, lsof, ipcs |
Identify blocking processes |
Troubleshooting Network Issues
1. Connectivity Testing
ping -c 4 8.8.8.8
ping -c 4 google.com
traceroute google.com
mtr google.com
nslookup google.com
dig google.com
dig @8.8.8.8 google.com
systemd-resolve --status
2. Port dan Service Checking
netstat -tulnp
ss -tulnp
lsof -i :80
nc -zv google.com 80
telnet google.com 80
nmap -p 80 google.com
sudo iptables -L -n -v
sudo ufw status verbose
3. Packet Capture dan Analysis
sudo tcpdump -i ens33 -w /troubleshooting/logs/network_capture.pcap
sudo tshark -r /troubleshooting/logs/network_capture.pcap -Y "http"
sudo tshark -r /troubleshooting/logs/network_capture.pcap -Y "dns"
sudo tcpdump -i ens33 -n port 80
sudo tshark -i ens33 -f "tcp port 443"
4. Network Issue Isolation
| Layer |
Symptoms |
Diagnosis Tools |
Common Solutions |
| Physical |
No link, packet loss |
ethtool, ip link, dmesg |
Check cables, NIC, drivers |
| Network |
No route to host |
ip route, traceroute, ping |
Fix routing, gateway config |
| Transport |
Connection refused |
netstat, ss, nmap |
Check services, firewall |
| Application |
Service errors, timeouts |
curl, telnet, logs |
Fix app config, dependencies |
Troubleshooting Service Issues
1. Service Status Investigation
sudo systemctl status apache2
sudo systemctl is-enabled apache2
sudo systemctl is-active apache2
sudo journalctl -u apache2 --since="1 hour ago"
sudo tail -f /var/log/apache2/error.log
sudo systemctl --failed
sudo journalctl -p 3 -xb
2. Service Dependency Checking
systemctl list-dependencies apache2
systemctl list-dependencies apache2 --reverse
systemctl list-unit-files --state=enabled
systemctl list-unit-files --state=failed
systemd-cgtop
systemctl show apache2 -p MemoryCurrent,CPUUsage
3. Configuration Validation
sudo apache2ctl configtest
sudo nginx -t
sudo sshd -t
sudo named-checkconf
diff /etc/apache2/apache2.conf /troubleshooting/backups/apache2.conf.backup
namei -l /etc/apache2/apache2.conf
ls -la /etc/apache2/
4. Service Recovery Procedures
| Issue |
Recovery Steps |
Verification |
Prevention |
| Service crash |
Restart service, check logs, fix config |
Service status, functionality test |
Monitoring, resource limits |
| Dependency failure |
Check dependent services, restart chain |
All services running, dependencies met |
Proper service ordering |
| Configuration error |
Validate config, restore backup, test |
Config test, service start |
Config management, testing |
| Resource exhaustion |
Increase limits, optimize, add resources |
Resource monitoring, performance |
Capacity planning, monitoring |
Security Incident Response
1. Unauthorized Access Detection
sudo lastb -a
sudo grep "Failed password" /var/log/auth.log
sudo fail2ban-client status
ps aux | grep -E "(curl|wget|nc|netcat|telnet)"
sudo lsof -i | grep ESTABLISHED
sudo crontab -l
sudo ls -la /etc/cron.*
2. Malware Scanning
sudo apt install clamav -y
sudo freshclam
sudo clamscan -r -i /home/
sudo chkrootkit
sudo rkhunter --check
find / -name "*.php" -mtime -1 2>/dev/null
find / -name ".ssh" -type d 2>/dev/null
3. Forensic Analysis
sudo aide --check
sudo tripwire --check
sudo find / -mtime -1 -type f -exec ls -la {} \; 2>/dev/null | head -20
sudo netstat -tulnp | grep -v 127.0.0.1
sudo ss -tulnp | grep -v 127.0.0.1
4. Incident Response Checklist
| Phase |
Actions |
Tools |
Documentation |
| Preparation |
Backup systems, establish procedures |
Backup tools, documentation |
Incident response plan |
| Identification |
Detect incident, assess impact |
Monitoring, log analysis |
Incident report |
| Containment |
Isolate systems, prevent spread |
Firewall, network isolation |
Containment actions |
| Eradication |
Remove threat, restore systems |
Malware scanners, backups |
Remediation steps |
| Recovery |
Restore operations, verify systems |
Backup restoration, testing |
Recovery validation |
| Lessons Learned |
Analyze incident, improve processes |
Post-mortem analysis |
Improvement plan |
Automated Troubleshooting Scripts
1. System Health Check Script
cat > /troubleshooting/scripts/health_check.sh << 'EOF'
LOG_FILE="/troubleshooting/logs/health_$(date +%Y%m%d_%H%M%S).log"
echo "=== SYSTEM HEALTH CHECK ===" > $LOG_FILE
echo "Date: $(date)" >> $LOG_FILE
echo "Hostname: $(hostname)" >> $LOG_FILE
echo "" >> $LOG_FILE
echo "=== CPU AND LOAD ===" >> $LOG_FILE
uptime >> $LOG_FILE
top -bn1 | head -5 >> $LOG_FILE
echo "" >> $LOG_FILE
echo "=== MEMORY USAGE ===" >> $LOG_FILE
free -h >> $LOG_FILE
echo "" >> $LOG_FILE
echo "=== DISK USAGE ===" >> $LOG_FILE
df -h >> $LOG_FILE
echo "" >> $LOG_FILE
echo "=== SERVICE STATUS ===" >> $LOG_FILE
systemctl list-units --state=failed >> $LOG_FILE
echo "" >> $LOG_FILE
echo "=== NETWORK CONNECTIONS ===" >> $LOG_FILE
netstat -tulnp | grep LISTEN >> $LOG_FILE
echo "" >> $LOG_FILE
echo "Health check completed: $LOG_FILE"
EOF
chmod +x /troubleshooting/scripts/health_check.sh
2. Incident Response Script
cat > /troubleshooting/scripts/incident_response.sh << 'EOF'
INCIDENT_DIR="/troubleshooting/incident_$(date +%Y%m%d_%H%M%S)"
mkdir -p $INCIDENT_DIR
date > $INCIDENT_DIR/timestamp.txt
uname -a > $INCIDENT_DIR/system_info.txt
ps aux > $INCIDENT_DIR/processes.txt
lsof > $INCIDENT_DIR/open_files.txt
netstat -tulnp > $INCIDENT_DIR/network_connections.txt
iptables -L -n > $INCIDENT_DIR/firewall_rules.txt
tail -100 /var/log/auth.log > $INCIDENT_DIR/auth_log.txt
tail -100 /var/log/syslog > $INCIDENT_DIR/syslog.txt
echo "Incident data collected: $INCIDENT_DIR"
EOF
chmod +x /troubleshooting/scripts/incident_response.sh
Documentation dan Reporting
1. Troubleshooting Report Template
cat > /troubleshooting/scripts/report_template.md << 'EOF'
- **Date**:
- **Time**:
- **System**:
- **Reported Issue**:
-
-
-
1.
2.
3.
-
-
-
EOF
2. Troubleshooting Log Template
| Time |
Action Taken |
Results |
Next Steps |
| 14:00 |
Checked system load |
Load avg: 5.2, high |
Investigate processes |
| 14:05 |
Identified process |
MySQL using 90% CPU |
Check MySQL queries |
| 14:10 |
Analyzed queries |
Slow query detected |
Optimize query |
| 14:20 |
Optimized query |
CPU usage dropped to 40% |
Monitor system |
Tugas dan Evaluasi
- Jelaskan langkah-langkah sistematis yang harus dilakukan ketika menghadapi sistem yang tidak bisa boot!
- Bagaimana cara membedakan antara masalah jaringan yang disebabkan oleh DNS, firewall, atau konektivitas fisik?
- Apa tools yang paling efektif untuk mengidentifikasi memory leak pada suatu proses?
- Bagaimana prosedur incident response yang tepat ketika mendeteksi aktivitas mencurigakan pada sistem?
- Buat skenario: Server web tiba-tiba merespons sangat lambat. Tulis langkah-langkah troubleshooting yang akan dilakukan!
Case Study: Database Performance Troubleshooting
echo "Starting database performance investigation..."
echo "=== SYSTEM RESOURCES ==="
top -bn1 | head -10
free -h
echo "=== DATABASE PROCESSES ==="
ps aux | grep -E "(mysql|postgres)" | head -10
echo "=== DISK I/O ==="
iostat -x 1 3
echo "=== DATABASE CONNECTIONS ==="
netstat -tulnp | grep -E "(3306|5432)"
echo "=== DATABASE LOGS ==="
tail -20 /var/log/mysql/error.log 2>/dev/null || tail -20 /var/log/postgresql/postgresql-*.log 2>/dev/null
echo "Initial investigation completed. Review outputs for further analysis."