Enterprise Backup Architecture
Enterprise backup is not about backing up data — it is about restoring data reliably when needed. The distinction matters because many organizations discover their backups are unusable only during a disaster recovery event. A backup that has never been tested is not a backup; it is a hope. Enterprise backup architecture focuses on documented processes, regular restore testing, and meeting defined RPO (Recovery Point Objective) and RTO (Recovery Time Objective) targets.
Backup principles
3-2-1 Backup rule:
3 copies of data (1 primary + 2 backups)
2 different storage media types
1 copy offsite (geographically separate)
Example:
Primary: production database on NVMe SSD (copy 1)
Backup 1: daily backup on NFS storage in same datacenter (copy 2)
Backup 2: weekly backup in S3 or different datacenter (copy 3, offsite)
RPO (Recovery Point Objective):
Maximum acceptable data loss in time
RPO = 1 hour → backups every hour (lose at most 1 hour of changes)
RTO (Recovery Time Objective):
Maximum time to restore service after failure
RTO = 4 hours → restore must complete within 4 hoursEnterprise backup strategy
| Data type | Frequency | Retention | Tool |
|---|---|---|---|
| Databases | Every 1-4 hours | 30 days | mysqldump + WAL archiving |
| Application files | Daily | 90 days | restic to S3 |
| System configs (/etc) | On every change | 1 year | etckeeper (git) |
| VM snapshots | Daily | 14 days | libvirt snapshots |
| Logs | Streaming | 90-365 days | ELK, Loki, S3 |
Backup tools comparison
| Tool | Type | Encryption | Deduplication |
|---|---|---|---|
| restic | File-level incremental | Yes (AES-256) | Yes |
| rsync | File sync | Via SSH | No |
| mysqldump | Database logical | Pipe to gpg | No |
| Barman | PostgreSQL dedicated | Yes | Yes |
| Duplicati | File-level to cloud | Yes | Yes |
# restic to S3 (the recommended general-purpose backup solution):
sudo apt install -y restic
# Initialize backup repository:
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
restic -r s3:s3.amazonaws.com/mybucket/backups init
# Back up /var/www and /etc:
restic -r s3:s3.amazonaws.com/mybucket/backups backup /var/www /etc
# List snapshots:
restic -r s3:s3.amazonaws.com/mybucket/backups snapshots
# Restore specific snapshot to /restore:
restic -r s3:s3.amazonaws.com/mybucket/backups restore latest --target /restore
Testing and validating restores
# Schedule monthly restore tests — document them:
# 1. Pick a random snapshot from 2 weeks ago
# 2. Restore to an isolated test environment
# 3. Verify application starts and data is intact
# 4. Document: test date, snapshot age, restore time, issues found
# Verify restic backup integrity (check for corruption):
restic -r s3:s3.amazonaws.com/mybucket/backups check
# Check backup age (alert if last backup is older than 25 hours):
restic -r s3:s3.amazonaws.com/mybucket/backups snapshots --json | python3 -c "
import sys, json
from datetime import datetime, timezone
snaps = json.load(sys.stdin)
if snaps:
last = datetime.fromisoformat(snaps[-1]['time'].replace('Z', '+00:00'))
age_hours = (datetime.now(timezone.utc) - last).total_seconds() / 3600
print(f'Last backup: {age_hours:.1f} hours ago')
if age_hours > 25:
print('ALERT: Backup is overdue!')
"
Conclusion
Automate backup verification as a separate scheduled job from the backup itself. A backup that runs successfully but produces a corrupt or incomplete archive will only be discovered during a restore. Use restic check to verify repository integrity and validate that snapshot data is complete. Document your RPO and RTO targets, then verify quarterly that your backup schedule, retention, and restore speed actually meet those targets. The worst time to discover your RTO is 8 hours when the target was 4 is during a production outage.
FAQ
Why should administrators understand Enterprise Backup Architecture?+
Because this topic affects planning decisions, server lifecycle, compatibility, support expectations, or how you reason about Ubuntu systems before making operational changes.
Do I need a lab for this topic?+
A lab is useful for checking commands and seeing the concept on a real Ubuntu machine, but the main value is understanding the decision, tradeoff, or system behavior clearly.
How should I use this knowledge in production?+
Use it to make better choices, document why those choices were made, and avoid rushed changes that ignore support windows, compatibility, stability, or operational risk.
Need help with Ubuntu administration?
Work directly with Muhammad Irfan Aslam for Ubuntu Server, Linux, cloud, Docker, DevOps, CI/CD, or infrastructure troubleshooting support.
Hire Me for Support