Scaling Backup and Restore for Large (20TB–200TB) Databases in SAP ASE
Now we move into real-world scale, where databases are so large that a simple backup strategy will fail due to:
Long backup windows (many hours)
Disk throughput limits
Restore time exceeding acceptable downtime
Massive transaction log growth
This guide is about how to design high-performance, scalable backup systems that work for 20TB–200TB databases.
1. The Reality of Large Databases
Let’s be honest: backing up a 200TB database is not just “running dump database.”
If you try:
dump database bigdb
to "/backup/bigdb_full.dmp"
You will likely face:
Backup takes 10–20+ hours
System I/O becomes saturated
Application performance degrades
Restore takes even longer
That is unacceptable in a banking system.
2. Key Concept: Backup Parallelism (Striping)
The most important concept for large databases is:
Parallel backups using multiple devices (striping)
Instead of writing to one file, you write to multiple files at the same time.
2.1 Why Striping Works
Each disk has limited throughput
Multiple disks is higher combined throughput
Backup runs faster
Restore also runs faster
Each disk has limited throughput
Multiple disks is higher combined throughput
Backup runs faster
Restore also runs faster
2.2 Example: Striped Backup
dump database bigdb
to "/backup1/bigdb_stripe1.dmp"
stripe on "/backup2/bigdb_stripe2.dmp"
stripe on "/backup3/bigdb_stripe3.dmp"
stripe on "/backup4/bigdb_stripe4.dmp"
with compression = 5
dump database bigdb
to "/backup1/bigdb_stripe1.dmp"
stripe on "/backup2/bigdb_stripe2.dmp"
stripe on "/backup3/bigdb_stripe3.dmp"
stripe on "/backup4/bigdb_stripe4.dmp"
with compression = 5
This runs 4 parallel streams.
2.3 How Many Stripes Should You Use?
Start simple:
20TB database → 4–8 stripes
50TB database → 8–16 stripes
100TB+ → 16–32 stripes
But don’t go extreme immediately—test gradually.
3. Storage Architecture for Backups
Your backup strategy is only as good as your storage.
3.1 Recommended Layout
Use multiple mount points:
/backup1/
/backup2/
/backup3/
/backup4/
Each should ideally be:
Separate disk / LUN
High throughput
Not shared with database data files
3.2 What NOT to Do
Avoid:
Single disk backups
Backups on same disk as database
Network bottlenecks
4. Designing Backup Strategy for Banking Systems
Now we combine everything.
4.1 Full Backup Strategy (Large DB)
Instead of daily full backups, consider:
Full backup: once every 2 days
Log backups: frequent
4.2 Transaction Log Strategy
For banking systems:
Every 5–15 minutes
Possibly every 1–5 minutes for high-volume systems
4.3 Example Schedule
Full backup: 2:00 AM
Log backups: every 10 minutes
Copy to remote: continuous
Full backup: 2:00 AM
Log backups: every 10 minutes
Copy to remote: continuous
5. Advanced Backup Script
5.1 Full Backup with Striping
use master
go
dump database bigdb
to "/backup1/bigdb_full_$(date +%Y%m%d)_1.dmp"
stripe on "/backup2/bigdb_full_$(date +%Y%m%d)_2.dmp"
stripe on "/backup3/bigdb_full_$(date +%Y%m%d)_3.dmp"
stripe on "/backup4/bigdb_full_$(date +%Y%m%d)_4.dmp"
with compression = 5, init
go
use master
go
dump database bigdb
to "/backup1/bigdb_full_$(date +%Y%m%d)_1.dmp"
stripe on "/backup2/bigdb_full_$(date +%Y%m%d)_2.dmp"
stripe on "/backup3/bigdb_full_$(date +%Y%m%d)_3.dmp"
stripe on "/backup4/bigdb_full_$(date +%Y%m%d)_4.dmp"
with compression = 5, init
go
5.2 Log Backup with Striping
use master
go
dump transaction bigdb
to "/backup1/bigdb_log_$(date +%Y%m%d_%H%M)_1.dmp"
stripe on "/backup2/bigdb_log_$(date +%Y%m%d_%H%M)_2.dmp"
with compression = 5
go
use master
go
dump transaction bigdb
to "/backup1/bigdb_log_$(date +%Y%m%d_%H%M)_1.dmp"
stripe on "/backup2/bigdb_log_$(date +%Y%m%d_%H%M)_2.dmp"
with compression = 5
go
6. Performance Tuning for Backups
Now we optimize.
6.1 Monitor I/O Performance
Use OS tools:
iostat -x 5
Watch for:
%util near 100 → disk bottleneck
High wait times → slow storage
6.2 Adjust Compression Level
Low compression = faster, larger files
High compression = slower, smaller files
Low compression = faster, larger files
High compression = slower, smaller files
Start with:
with compression = 3
Increase if CPU allows.
6.3 Use Backup Server Configuration
Check:
sp_configure "number of backup connections"
Increase if needed.
7. Restore Strategy for Large Databases
Backing up is only half the job.
Restoring a 100TB database quickly is the real challenge.
7.1 Parallel Restore (Same as Backup)
load database bigdb
from "/backup1/bigdb_full_20260514_1.dmp"
stripe on "/backup2/bigdb_full_20260514_2.dmp"
stripe on "/backup3/bigdb_full_20260514_3.dmp"
stripe on "/backup4/bigdb_full_20260514_4.dmp"
go
load database bigdb
from "/backup1/bigdb_full_20260514_1.dmp"
stripe on "/backup2/bigdb_full_20260514_2.dmp"
stripe on "/backup3/bigdb_full_20260514_3.dmp"
stripe on "/backup4/bigdb_full_20260514_4.dmp"
go
7.2 Apply Logs Sequentially
load transaction bigdb
from "/backup1/bigdb_log_20260514_0210_1.dmp"
go
load transaction bigdb
from "/backup1/bigdb_log_20260514_0210_1.dmp"
go
Repeat for each log backup.
8. Reducing Restore Time (Critical for Banks)
8.1 Keep Backups on Fast Storage
SSD or high-speed SAN
Avoid slow network restores
SSD or high-speed SAN
Avoid slow network restores
8.2 Minimize Number of Log Files
Too many small log backups = slow restore.
Balance frequency vs restore time.
8.3 Practice Restore Regularly
At least:
Weekly full restore test
Monthly disaster simulation
9. Automating at Scale
9.1 Shell Script Example
#!/bin/bash
DATE=$(date +%Y%m%d)
isql -Uuser -Ppassword -Sserver <<EOF
dump database bigdb
to "/backup1/bigdb_full_${DATE}_1.dmp"
stripe on "/backup2/bigdb_full_${DATE}_2.dmp"
stripe on "/backup3/bigdb_full_${DATE}_3.dmp"
stripe on "/backup4/bigdb_full_${DATE}_4.dmp"
with compression = 5, init
go
EOF
#!/bin/bash
DATE=$(date +%Y%m%d)
isql -Uuser -Ppassword -Sserver <<EOF
dump database bigdb
to "/backup1/bigdb_full_${DATE}_1.dmp"
stripe on "/backup2/bigdb_full_${DATE}_2.dmp"
stripe on "/backup3/bigdb_full_${DATE}_3.dmp"
stripe on "/backup4/bigdb_full_${DATE}_4.dmp"
with compression = 5, init
go
EOF
9.2 Logging Everything
./backup.sh > backup_$(date +%Y%m%d).log 2>&1
./backup.sh > backup_$(date +%Y%m%d).log 2>&1
10. Free Tools for Large Environments
You don’t need expensive tools to be effective.
10.1 Useful Free Tools
rsync → copy backups
scp → secure transfer
cron → scheduling
iostat, vmstat → performance monitoring
grep, awk → log parsing
rsync → copy backups
scp → secure transfer
cron → scheduling
iostat, vmstat → performance monitoring
grep, awk → log parsing
10.2 Example: Remote Backup Copy
rsync -av --progress /backup1/ backupserver:/remote_backup/
rsync -av --progress /backup1/ backupserver:/remote_backup/
11. Monitoring and Alerts
You must detect failures immediately.
11.1 Simple Alert Script
grep -i error backup.log && mail -s "Backup Failed" dba@bank.com < backup.log
grep -i error backup.log && mail -s "Backup Failed" dba@bank.com < backup.log
11.2 What to Monitor
Backup duration
File size anomalies
Failed jobs
Disk space
Backup duration
File size anomalies
Failed jobs
Disk space
12. Handling Failures
12.1 If Backup Fails
Check logs immediately
Retry backup
Ensure log backups continue
Check logs immediately
Retry backup
Ensure log backups continue
12.2 If Disk Fills Up
Delete old backups (carefully)
Move to remote storage
Increase capacity
Delete old backups (carefully)
Move to remote storage
Increase capacity
13. Best Practices
Always use striping for large DBs
Separate backup storage from data
Keep multiple backup copies
Test restore frequently
Monitor everything
Document procedures
Always use striping for large DBs
Separate backup storage from data
Keep multiple backup copies
Test restore frequently
Monitor everything
Document procedures
14. Expanded Checklist
Daily:
Full backup completed
Log backups every 5–15 min
Backup size looks correct
Files copied to remote
Weekly:
Test restore completed
Backup performance reviewed
Monthly:
Disaster recovery drill
Storage capacity check
No comments:
Post a Comment