Runbook for Production Migration from OnPrem SQL SERVER to SQL SERVER AWS EC2 with Minute-by-Minute Timing
Scenario Assumptions
* Source: On-prem SQL Server
* Target: SQL Server on Amazon EC2
* Method: Backup + Restore + Log Shipping
* Downtime target: 15–30 minutes
* Database size: 10TB–100TB
* Migration window: Weekend (example: Saturday night)
BIG PICTURE FLOW
* Days before → prepare and sync
* Final hour → freeze + final logs
* Minutes → cutover
* After → validate + stabilize
T-7 DAYS (ONE WEEK BEFORE)
Goal: Eliminate surprises
Tasks:
* Full inventory completed
* Test full backup + restore on EC2
* Measure restore time
* Test log shipping loop
* Validate application connectivity
Must Confirm:
* You can restore DB successfully on EC2
* Log shipping works continuously
* Performance is acceptable
T-1 DAY (FINAL PREPARATION)
Goal: Be 100% ready
Tasks:
* Notify stakeholders
* Confirm downtime window
* Freeze schema changes
* Verify backups
Run Final Pre-Check Script
```sql
SELECT name, state_desc FROM sys.databases;
DBCC CHECKDB('YourDB') WITH NO_INFOMSGS;
```
T-6 HOURS (START INITIAL LOAD)
Goal: Get most data to EC2 before downtime
Step 1: Take Full Backup
```sql
BACKUP DATABASE YourDB
TO DISK = 'G:\Backup\YourDB_1.bak',
DISK = 'G:\Backup\YourDB_2.bak',
DISK = 'G:\Backup\YourDB_3.bak'
WITH COMPRESSION, STATS = 5;
```
Step 2: Transfer Backup
```powershell
robocopy G:\Backup \\EC2-SQL\Backup *.bak /MT:16
```
Step 3: Restore on EC2 (NORECOVERY)
```sql
RESTORE DATABASE YourDB
FROM DISK = 'G:\Backup\YourDB_1.bak',
DISK = 'G:\Backup\YourDB_2.bak',
DISK = 'G:\Backup\YourDB_3.bak'
WITH NORECOVERY;
```
T-4 HOURS
Goal: Start continuous sync
Step: Start Log Shipping Loop
Every 5 minutes:
```sql
BACKUP LOG YourDB TO DISK = 'G:\Backup\log.trn';
```
Copy + restore automatically.
Monitor:
* Logs applying correctly
* No gaps
* No errors
T-60 MINUTES (CRITICAL PHASE STARTS)
Goal: Prepare for cutover
Tasks:
* Notify users: “System going read-only soon”
* Ensure log shipping is current
* Validate EC2 database
Validation Script
```sql
SELECT COUNT(*) FROM CriticalTable;
```
Compare with source.
T-30 MINUTES
Goal: Reduce final sync gap
Change Log Backup Frequency
* From every 5 min → every 1 min
Confirm:
* No backlog of logs
* EC2 nearly in sync
T-15 MINUTES (POINT OF NO RETURN NEAR)
Goal: Prepare final cutover
Tasks:
* Stop batch jobs
* Pause ETL processes
* Notify: “Final downtime starting”
T-10 MINUTES (WRITE FREEZE)
Goal: Stop all writes
Step 1: Disable Application Writes
Options:
* Stop application services
* Or set DB to read-only
---
```sql
ALTER DATABASE YourDB SET READ_ONLY;
```
Step 2: Verify No Active Transactions
```sql
SELECT * FROM sys.dm_tran_active_transactions;
```
T-5 MINUTES (FINAL SYNC)
Goal: Capture last changes
Step 1: Final Log Backup
```sql
BACKUP LOG YourDB
TO DISK = 'G:\Backup\final.trn'
WITH NORECOVERY;
```
Step 2: Copy Final Log
```powershell
robocopy G:\Backup \\EC2-SQL\Backup final.trn
```
Step 3: Apply Final Log
```sql
RESTORE LOG YourDB
FROM DISK = 'G:\Backup\final.trn'
WITH RECOVERY;
```
T-0 (CUTOVER MOMENT)
THIS IS THE SWITCH
Step 1: Bring EC2 Database Online
Already done with `WITH RECOVERY`
Step 2: Update Application Connection
Change:
* Connection string
* DNS alias (preferred)
Step 3: Enable Application
* Start services
* Allow connections
T+5 MINUTES
Goal: Smoke Test
Test:
* Login works
* Key queries run
* Application loads
SQL Test
```sql
SELECT TOP 10 * FROM CriticalTable;
```
T+15 MINUTES
Goal: Validate system health
Run:
```sql
DBCC CHECKDB('YourDB') WITH NO_INFOMSGS;
```
Check:
* CPU usage
* Disk latency
* Errors
T+30 MINUTES
Goal: Confirm success
Tasks
* Business validation (very important)
* Confirm no missing data
* Monitor performance
T+1 HOUR
Goal: Stabilization
Tasks
* Enable SQL Agent jobs
* Resume ETL
* Monitor logs
T+24 HOURS
Goal: Final confirmation
Tasks:
* Full backup on EC2
* Review performance
* Confirm no issues
ROLLBACK PLAN (MUST EXIST)
If something fails at T+5 or T+15:
Step 1: Redirect Application Back
* Point to on-prem SQL Server
Step 2: Re-enable Writes
```sql
ALTER DATABASE YourDB SET READ_WRITE;
```
Step 3: Investigate
---
REAL-WORLD TIMING SUMMARY
Bulk Data Movement:
* Happens hours before (T-6h)
Sync Phase:
* Continuous (T-4h → T-10m)
Downtime Window:
Only:T-10 minutes → T+5 minutes
PRO TIPS
Tip 1: Always Use DNS
Instead of changing app config: Point DNS to EC2
Tip 2: Practice Twice
* First run → learning
* Second run → timing
Tip 3: Record Everything
* Start time
* End time
* Issues
Tip 4: Have War Room
* DBA
* App team
* Network team
---
REALITY Check:
A successful migration is not about just moving data rather it is Timing, Coordination, Recovery readiness
No comments:
Post a Comment