High Availability Clusters

A high availability (HA) cluster eliminates single points of failure by running services across multiple nodes. When the active node fails, a standby node takes over automatically — typically within seconds. On Ubuntu, Pacemaker (cluster resource manager) and Corosync (cluster communication layer) are the standard HA stack. The most common use case is a two-node active/passive cluster with a floating Virtual IP address that moves between nodes on failover.

HA concepts

Active/Passive HA cluster (2-node):

  node-01 (ACTIVE)          node-02 (STANDBY)
  +------------------+      +------------------+
  | VIP: 10.0.0.100  |      | No VIP           |
  | nginx: running   |      | nginx: stopped   |
  | app: running     |      | app: stopped     |
  +------------------+      +------------------+
          |
          | Corosync heartbeat (port 5404/5405)
          | Pacemaker STONITH fencing
          v
  node-01 fails (power loss, kernel panic):
  → Corosync detects missing heartbeat (1-2 seconds)
  → Pacemaker fences node-01 (shoots the other node in the head - STONITH)
  → Pacemaker moves VIP and resources to node-02
  → Total failover: 5-30 seconds

Pacemaker and Corosync

# Install on BOTH nodes:
sudo apt install -y pacemaker corosync pcs crmsh

# Set cluster password for pcs user:
sudo passwd hacluster    # Set same password on both nodes

# Authenticate nodes with each other (run on node-01):
sudo pcs host auth node-01 node-02 -u hacluster

# Create the cluster (run on node-01):
sudo pcs cluster setup mycluster node-01 node-02
sudo pcs cluster start --all
sudo pcs cluster enable --all

pcs cluster status

Cluster Status:
 Stack: corosync
 Current DC: node-01 (version ...) - partition with quorum
 Last updated: Mon Jun 09 14:30:00 2025
 2 nodes configured
 0 resources configured

Online: [node-01 node-02]

Virtual IP failover

# Add a Virtual IP resource (the floating IP):
sudo pcs resource create VirtualIP ocf:heartbeat:IPaddr2   ip=10.0.0.100   cidr_netmask=24   nic=eth0   op monitor interval=30s

# Add nginx as a resource:
sudo pcs resource create WebServer systemd:nginx   op monitor interval=30s

# Create group so VIP and nginx start/stop together:
sudo pcs resource group add WebGroup VirtualIP WebServer

# Configure colocation: both resources on same node:
sudo pcs constraint colocation add WebServer with VirtualIP

# Check resource status:
sudo pcs status resources

pcs status resources

  * Resource Group: WebGroup:
    * VirtualIP  (ocf::heartbeat:IPaddr2):   Started node-01
    * WebServer  (systemd:nginx):            Started node-01

Testing failover

# Test failover manually (simulate node failure):
sudo pcs node standby node-01
# → Resources should move to node-02

# Verify:
sudo pcs status    # VirtualIP should now show Started node-02
ping 10.0.0.100    # VIP still responds from node-02

# Restore node-01:
sudo pcs node unstandby node-01

# Force failback to node-01 (set preference):
sudo pcs constraint location WebGroup prefers node-01=50

# Real failover test (careful — this will cause a brief outage):
# sudo systemctl stop corosync    # on node-01

Conclusion

The most important principle in HA clusters is STONITH (Shoot The Other Node In The Head) — always configure a fencing mechanism. Without fencing, a split-brain scenario (both nodes think they are active) results in data corruption. Fencing ensures the failed node is powered off before the standby takes over. On virtual machines, STONITH is typically implemented via the hypervisor API (VMware, libvirt); on physical hardware, use IPMI/iDRAC/iLO fencing.

FAQ

Is High Availability Clusters important for Ubuntu administrators?+

Yes. It supports practical Ubuntu administration because it connects directly to server reliability, security, troubleshooting, or daily operations.

Should I practice this on a live server?+

Use a lab VM first. After you understand the command output and rollback path, apply the workflow carefully on real systems.

What should I do after reading this article?+

Run the practice commands, write down what each one shows, and continue to the next article in the Ubuntu roadmap.

Need help with Ubuntu administration?

Work directly with Muhammad Irfan Aslam for Ubuntu Server, Linux, cloud, Docker, DevOps, CI/CD, or infrastructure troubleshooting support.

Hire Me for Support

High Availability Clusters