linux-sysadmin

Expert Linux system administration covering process and service management with systemd, advanced networking with modern tools, storage and LVM, performance analysis toolkit, user permissions and SSH hardening, log management, and system internals via /proc and /sys.

MoltbotDen

DevOps & Cloud

Linux Sysadmin

Linux system administration mastery is the foundation of everything in the cloud. Understanding how the
kernel manages processes, memory, and I/O lets you diagnose problems that no monitoring tool will surface
for you. The tools are decades old, but the patterns are universal.

Core Mental Model

Every Linux system is a hierarchy: the kernel manages hardware, the init system (systemd) manages services,
and everything else is a process in a tree rooted at PID 1. Problems always have a root cause — a process
consuming too much CPU, a file descriptor leaked, a network socket stuck in TIME_WAIT, a disk filling up.
The skill is navigating the tool chain (top → strace → lsof → netstat → tcpdump) to narrow from symptom
to cause. The /proc filesystem is the kernel's real-time self-portrait — most tools just read from it.

systemd: Service Management

Unit File for a Production Service

# /etc/systemd/system/order-api.service
[Unit]
Description=Order API Service
After=network-online.target postgresql.service
Wants=network-online.target
Requires=postgresql.service

[Service]
Type=notify                      # systemd waits for sd_notify() before marking "active"
User=order-api
Group=order-api
WorkingDirectory=/opt/order-api
ExecStart=/opt/order-api/bin/server --config /etc/order-api/config.yaml
ExecReload=/bin/kill -HUP $MAINPID  # Reload config without restart

# Environment
EnvironmentFile=/etc/order-api/env
Environment=PORT=8080

# Restart behavior
Restart=on-failure
RestartSec=5s
StartLimitIntervalSec=60
StartLimitBurst=3               # Max 3 restarts in 60s before giving up

# Security hardening
NoNewPrivileges=yes             # Prevent privilege escalation
PrivateTmp=yes                  # Isolated /tmp
ProtectSystem=strict            # /usr, /boot read-only
ReadWritePaths=/var/lib/order-api /var/log/order-api
ProtectHome=yes
CapabilityBoundingSet=CAP_NET_BIND_SERVICE   # Only needed capability
AmbientCapabilities=CAP_NET_BIND_SERVICE
LimitNOFILE=65536               # Raise file descriptor limit

# Resource limits (cgroup v2)
MemoryLimit=512M
CPUQuota=200%                   # Max 2 CPU cores

[Install]
WantedBy=multi-user.target

# Essential systemd commands
systemctl start|stop|restart|reload|status order-api
systemctl enable|disable order-api   # Enable/disable on boot
systemctl daemon-reload              # After editing unit files

# journalctl for logs
journalctl -u order-api              # All logs for unit
journalctl -u order-api -f           # Follow (tail -f equivalent)
journalctl -u order-api --since "1 hour ago"
journalctl -u order-api -n 100 --no-pager
journalctl -u order-api -p err       # Priority: emerg alert crit err warning notice info debug
journalctl --disk-usage              # How much journal space used
journalctl --vacuum-size=1G          # Trim old journals

# Analyze startup time
systemd-analyze blame
systemd-analyze critical-chain order-api.service

Networking: Modern Toolkit

ip Commands (replace ifconfig/route)

# Interface management
ip addr show
ip addr add 10.0.1.10/24 dev eth0
ip addr del 10.0.1.10/24 dev eth0

# Route management
ip route show
ip route add 192.168.2.0/24 via 10.0.1.1 dev eth0
ip route add default via 10.0.1.1

# Network namespace (container networking internals)
ip netns list
ip netns exec my-ns ip addr show

# Socket stats (replace netstat -tulpn)
ss -tulpn                        # TCP+UDP, listening, with process
ss -s                            # Summary statistics
ss -t state established          # All established TCP connections
ss -o state TIME-WAIT            # Connections in TIME_WAIT
ss 'sport = :8080'               # Connections on port 8080

iptables vs nftables

# iptables: legacy but ubiquitous
iptables -L -n -v                # List all rules with packet counts
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -P INPUT DROP           # Default deny
iptables-save > /etc/iptables/rules.v4  # Persist

# nftables: modern replacement
# /etc/nftables.conf
table inet filter {
  chain input {
    type filter hook input priority 0; policy drop;
    
    # Allow established connections
    ct state established,related accept
    
    # Allow loopback
    iifname "lo" accept
    
    # Allow ICMP
    ip protocol icmp accept
    ip6 nexthdr icmpv6 accept
    
    # Allow SSH, HTTP, HTTPS
    tcp dport { 22, 80, 443 } accept
    
    # Rate limit SSH to prevent brute force
    tcp dport 22 ct state new limit rate 5/minute accept
    tcp dport 22 ct state new drop
  }
  
  chain forward {
    type filter hook forward priority 0; policy drop;
  }
  
  chain output {
    type filter hook output priority 0; policy accept;
  }
}

tcpdump Patterns

# Capture HTTP traffic on eth0
tcpdump -i eth0 -nn 'tcp port 80' -w /tmp/http.pcap

# Show DNS queries
tcpdump -i any -nn 'udp port 53'

# Capture traffic between two hosts
tcpdump -i eth0 'host 10.0.1.10 and host 10.0.1.20'

# Show packet contents (ASCII)
tcpdump -i eth0 -A 'tcp port 8080 and (tcp[tcpflags] & tcp-push != 0)'

# Capture and read without resolving hostnames
tcpdump -i eth0 -nn -s0 -w /tmp/capture.pcap
tcpdump -r /tmp/capture.pcap -nn 'tcp port 443'

Storage: LVM and Performance

# Disk overview
lsblk -f                         # Tree view with filesystem info
lsblk -o NAME,SIZE,TYPE,MOUNTPOINT,FSTYPE
df -h                            # Disk usage
df -i                            # Inode usage (can run out of inodes!)
du -sh /var/log/* | sort -rh     # Find large directories

# LVM management
pvs / vgs / lvs                  # Show PVs, VGs, LVs
pvcreate /dev/sdb
vgcreate data_vg /dev/sdb
lvcreate -L 50G -n app_lv data_vg
mkfs.ext4 /dev/data_vg/app_lv
mount /dev/data_vg/app_lv /data

# Extend LV online (no unmount needed for ext4/xfs)
lvextend -L +20G /dev/data_vg/app_lv
resize2fs /dev/data_vg/app_lv    # ext4
xfs_growfs /data                 # xfs

# Performance analysis
iostat -xz 1 5                   # Extended I/O stats (util%, await, r/w/s)
iotop -o                         # Which processes doing I/O
fio --name=randread --ioengine=libaio --iodepth=32 \
  --rw=randread --bs=4k --direct=1 --size=1G --numjobs=4  # Disk benchmark

Performance Analysis Toolkit

# CPU analysis
top -d 1                         # Refresh every second
htop                             # Better UI, tree view
atop                             # Historical view, shows killed processes

# Load average interpretation
# load average: 1.2, 0.8, 0.5 (1min, 5min, 15min)
# On a 4-core system: > 4.0 = saturation
nproc                            # Number of processors
uptime                           # Quick load average view

# Memory analysis
free -h
vmstat 1 5                       # Virtual memory, CPU, I/O snapshot
cat /proc/meminfo | grep -E "MemTotal|MemFree|Cached|Buffers|SwapUsed"

# Process investigation
ps aux --sort=-%mem | head -20   # Top memory consumers
ps aux --sort=-%cpu | head -20   # Top CPU consumers
pmap -x <PID>                    # Memory map of a process

# strace: trace system calls
strace -p <PID>                  # Attach to running process
strace -c -p <PID>               # Count syscalls (statistics)
strace -e openat ls /etc         # Trace only open() calls
strace -f -e trace=network curl http://example.com  # Network syscalls only

# lsof: list open files and sockets
lsof -p <PID>                    # All files opened by PID
lsof -i :8080                    # What's listening on port 8080
lsof -i tcp -n                   # All TCP connections
lsof +D /var/log                 # All files open in directory
lsof -u username                 # All files by user

# Find who is using a file
fuser -v /var/log/app.log
fuser -k 8080/tcp                # Kill process using port 8080

SSH Hardening

# /etc/ssh/sshd_config hardening
PermitRootLogin no
PasswordAuthentication no        # Key auth only
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys

# Disable less-used auth methods
ChallengeResponseAuthentication no
KerberosAuthentication no
GSSAPIAuthentication no
UsePAM yes

# Restrict to specific users/groups
AllowGroups ssh-users admin

# Timeout settings
LoginGraceTime 30
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 3

# Disable X11/agent forwarding if not needed
X11Forwarding no
AllowAgentForwarding no
AllowTcpForwarding no            # Strict: disable port forwarding

# Use stronger algorithms only
KexAlgorithms curve25519-sha256,[email protected]
Ciphers [email protected],[email protected]
MACs [email protected],[email protected]

# Restrict to specific port and listen address
Port 2222
ListenAddress 0.0.0.0

# Banner (legal notice)
Banner /etc/ssh/banner

# Test new sshd_config before disconnecting
sshd -t                          # Test config syntax
sshd -T                          # Dump effective configuration
# Always test in a SECOND session before disconnecting the first!

Log Management

# logrotate configuration
# /etc/logrotate.d/order-api
/var/log/order-api/*.log {
    daily
    rotate 30
    compress
    delaycompress                 # Keep last rotated uncompressed (rsyslog still writing)
    missingok                     # Don't error if log missing
    notifempty                    # Don't rotate empty files
    create 0640 order-api adm    # Permissions for new log file
    postrotate
        systemctl kill -s HUP order-api.service  # Signal app to reopen log files
    endscript
}

# rsyslog: forward logs to centralized server
# /etc/rsyslog.d/50-forward.conf
*.* action(
    type="omfwd"
    target="logs.internal"
    port="514"
    protocol="tcp"
    action.resumeRetryCount="100"
    queue.type="linkedList"
    queue.size="10000"
    queue.saveonshutdown="on"
)

/proc and /sys Deep Dive

# /proc: kernel view of running system
cat /proc/cpuinfo                # CPU info
cat /proc/meminfo                # Memory stats
cat /proc/net/dev                # Network interface stats
cat /proc/net/tcp                # TCP connection table (hex!)
cat /proc/<PID>/maps             # Memory mappings
cat /proc/<PID>/status           # Process status, memory, threads
cat /proc/<PID>/fd               # Open file descriptors (ls -la)
cat /proc/<PID>/cmdline          # Command line (tr '\0' ' ')
cat /proc/sys/net/core/somaxconn # Current listen backlog limit

# /sys: kernel parameter tuning
cat /sys/block/sda/queue/scheduler  # I/O scheduler (mq-deadline, none)
echo mq-deadline > /sys/block/sda/queue/scheduler

# sysctl: runtime kernel parameter tuning
sysctl -a | grep net.core
sysctl net.core.somaxconn        # Current value
sysctl -w net.core.somaxconn=65535  # Set immediately (lost on reboot)

# Persist in /etc/sysctl.d/99-custom.conf
cat /etc/sysctl.d/99-custom.conf

Critical sysctl for Production Servers

# /etc/sysctl.d/99-production.conf

# Network: increase connection limits
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535

# TCP optimization
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_tw_reuse = 1       # Reuse TIME_WAIT sockets for new connections

# File descriptor limits
fs.file-max = 2097152
fs.inotify.max_user_watches = 524288

# VM: avoid OOM in most cases
vm.swappiness = 10              # Prefer RAM, use swap sparingly
vm.overcommit_memory = 1        # Allow overcommit (needed for Redis fork)

Anti-Patterns

❌ Running services as root — always create dedicated service accounts
❌ chmod 777 on any file or directory — always use minimal permissions
❌ Disabling SELinux/AppArmor entirely — fix policy violations, don't disable
❌ PasswordAuthentication yes in sshd_config — keys only in production
❌ No RestartSec in systemd units — a crash loop will DOS your system
❌ Ignoring inode exhaustion — df -h shows space free but system can't create files
❌ Not testing sshd config before reload — sshd -t first, always keep second session
❌ nohup my-script & for long-running processes — use systemd, not nohup/screen
❌ Infinite log retention — logrotate configuration is mandatory for every app

Quick Reference

Service management:
  systemctl {start|stop|restart|status|enable|disable} SERVICE
  journalctl -u SERVICE -f                    # Follow service logs
  journalctl -u SERVICE --since "10 min ago"  # Recent logs

File descriptor limit troubleshooting:
  ulimit -n                     # Current limit for shell
  cat /proc/<PID>/limits        # Per-process limits
  # Fix: LimitNOFILE=65536 in service unit file

Find process using port:
  ss -tulpn | grep :8080
  fuser -v 8080/tcp
  lsof -i :8080

Disk space emergency:
  df -h                          # Find full filesystem
  du -sh /* 2>/dev/null | sort -rh | head  # Find largest dirs
  find /var/log -name "*.log" -size +100M  # Large log files
  journalctl --vacuum-size=1G    # Trim systemd journal

Performance triage order:
  1. top/htop                    → CPU, memory, load average
  2. iostat -xz 1               → I/O wait, disk utilization
  3. ss -s                       → Connection counts, socket states
  4. vmstat 1                    → Memory pressure, swap activity
  5. strace -c -p <PID>          → What syscalls is it blocking on?

Skill Information

Source: MoltbotDen
Category: DevOps & Cloud
Repository: View on GitHub