Enterprise Backup Architecture

Enterprise backup is not about backing up data — it is about restoring data reliably when needed. The distinction matters because many organizations discover their backups are unusable only during a disaster recovery event. A backup that has never been tested is not a backup; it is a hope. Enterprise backup architecture focuses on documented processes, regular restore testing, and meeting defined RPO (Recovery Point Objective) and RTO (Recovery Time Objective) targets.

Backup principles

3-2-1 Backup rule:
  3 copies of data (1 primary + 2 backups)
  2 different storage media types
  1 copy offsite (geographically separate)

  Example:
    Primary: production database on NVMe SSD (copy 1)
    Backup 1: daily backup on NFS storage in same datacenter (copy 2)
    Backup 2: weekly backup in S3 or different datacenter (copy 3, offsite)

RPO (Recovery Point Objective):
  Maximum acceptable data loss in time
  RPO = 1 hour → backups every hour (lose at most 1 hour of changes)

RTO (Recovery Time Objective):
  Maximum time to restore service after failure
  RTO = 4 hours → restore must complete within 4 hours

Enterprise backup strategy

Data type	Frequency	Retention	Tool
Databases	Every 1-4 hours	30 days	mysqldump + WAL archiving
Application files	Daily	90 days	restic to S3
System configs (/etc)	On every change	1 year	etckeeper (git)
VM snapshots	Daily	14 days	libvirt snapshots
Logs	Streaming	90-365 days	ELK, Loki, S3

Backup tools comparison

Tool	Type	Encryption	Deduplication
restic	File-level incremental	Yes (AES-256)	Yes
rsync	File sync	Via SSH	No
mysqldump	Database logical	Pipe to gpg	No
Barman	PostgreSQL dedicated	Yes	Yes
Duplicati	File-level to cloud	Yes	Yes

# restic to S3 (the recommended general-purpose backup solution):
sudo apt install -y restic

# Initialize backup repository:
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
restic -r s3:s3.amazonaws.com/mybucket/backups init

# Back up /var/www and /etc:
restic -r s3:s3.amazonaws.com/mybucket/backups backup /var/www /etc

# List snapshots:
restic -r s3:s3.amazonaws.com/mybucket/backups snapshots

# Restore specific snapshot to /restore:
restic -r s3:s3.amazonaws.com/mybucket/backups restore latest --target /restore

Testing and validating restores

# Schedule monthly restore tests — document them:
# 1. Pick a random snapshot from 2 weeks ago
# 2. Restore to an isolated test environment
# 3. Verify application starts and data is intact
# 4. Document: test date, snapshot age, restore time, issues found

# Verify restic backup integrity (check for corruption):
restic -r s3:s3.amazonaws.com/mybucket/backups check

# Check backup age (alert if last backup is older than 25 hours):
restic -r s3:s3.amazonaws.com/mybucket/backups snapshots --json |   python3 -c "
import sys, json
from datetime import datetime, timezone
snaps = json.load(sys.stdin)
if snaps:
    last = datetime.fromisoformat(snaps[-1]['time'].replace('Z', '+00:00'))
    age_hours = (datetime.now(timezone.utc) - last).total_seconds() / 3600
    print(f'Last backup: {age_hours:.1f} hours ago')
    if age_hours > 25:
        print('ALERT: Backup is overdue!')
"

Conclusion

Automate backup verification as a separate scheduled job from the backup itself. A backup that runs successfully but produces a corrupt or incomplete archive will only be discovered during a restore. Use restic check to verify repository integrity and validate that snapshot data is complete. Document your RPO and RTO targets, then verify quarterly that your backup schedule, retention, and restore speed actually meet those targets. The worst time to discover your RTO is 8 hours when the target was 4 is during a production outage.

FAQ

Why should administrators understand Enterprise Backup Architecture?+

Because this topic affects planning decisions, server lifecycle, compatibility, support expectations, or how you reason about Ubuntu systems before making operational changes.

Do I need a lab for this topic?+

A lab is useful for checking commands and seeing the concept on a real Ubuntu machine, but the main value is understanding the decision, tradeoff, or system behavior clearly.

How should I use this knowledge in production?+

Use it to make better choices, document why those choices were made, and avoid rushed changes that ignore support windows, compatibility, stability, or operational risk.

Need help with Ubuntu administration?

Work directly with Muhammad Irfan Aslam for Ubuntu Server, Linux, cloud, Docker, DevOps, CI/CD, or infrastructure troubleshooting support.

Hire Me for Support

Enterprise Backup Architecture