War Room Script During The Actual Migration From On-prem SQL Server To SQL Server on AWS EC2
This separates a “technical person” from someone who can actually run a high-risk production migration. The template includes:
* who speaks
* who acts
* what to say /write
*when to pause, escalate, or rollback
With this script, you can:
* Lead a real production migration
* Speak with authority and clarity
* Handle failures without panic
Scenario:
* Migration: On-prem SQL Server → SQL Server on EC2
* Database: 10–100TB
* Downtime target: 20 minutes
* Environment: Mission-critical (finance / production)
WAR ROOM ROLES with CLEAR RESPONSIBILITIES
You must assign these BEFORE starting.
1. Migration Commander (YOU / LEAD DBA)
* Final decision maker
* Controls timeline
* Approves cutover / rollback
2. SQL Server Execution DBA
* Runs SQL scripts
* Monitors backup/restore/logs
3. Infrastructure Engineer
* EC2, storage, network
* Handles disk / IOPS / connectivity
4. Application Owner
* Validates application
* Confirms business functionality
5. Observer / Scriber
* Writes down:
* timestamps
* issues
* decisions
WAR ROOM RULES (READ THIS OUT LOUD / COMNICATE CLEARLY AT START)
*Opening Statemen by Migration Commander
"This is a controlled production migration. Only the assigned person speaks during each step. No action is taken without confirmation. If any critical issue occurs, we pause immediately. If data consistency is at risk, we rollback without hesitation."
PHASE 1: T-30 MINUTES (PRE-CUTOVER)
Migration Commander:
"We are at T minus 30 minutes.
Confirm all systems are ready.
SQL DBA, give me sync status."
SQL DBA:
"Log shipping is active.
Last log applied is within 1 minute.
No errors detected."
Commander to Infra
"Infrastructure status?"
Infra Engineer:
"EC2 is stable.
Disk usage is within limits.
No network issues."
Commander Decision:
"Proceed to pre-cutover validation."
PHASE 2: T-15 MINUTES (PREPARE FREEZE)
Commander:
"Application team, prepare to stop writes.
Confirm when ready."
Application Owner:
"Application ready for write freeze.
No critical jobs running."
PHASE 3: T-10 MINUTES (WRITE FREEZE)
Commander:
"Execute write freeze now."
SQL DBA (after running):
"Database set to read-only.
Checking active transactions."
SQL DBA (confirmation):
"No active transactions remaining."
IF FAILURE (TRANSACTIONS NOT STOPPING)
SQL DBA:
"We have active blocking transactions.
Requesting permission to terminate sessions."
Commander:
"Approved. Terminate only long-running sessions.
Proceed carefully."
PHASE 4: T-5 MINUTES (FINAL SYNC)
Commander:
"Proceed with final log backup."
SQL DBA:
"Final log backup started."
SQL DBA (after backup):
"Final log backup completed.
Copying to EC2."
SQL DBA:
"Final log restored with recovery.
Target database is online."
FAILURE BRANCH (CRITICAL)
If final log fails
SQL DBA:
"Final log backup failed.
Log chain may be broken."
Commander (STRICT RESPONSE):
"Stop all cutover activities.
We are aborting migration.
Re-enable source database immediately."
PHASE 5: T-0 (CUTOVER)
Commander:
"We are at cutover point.
Switch application connections to EC2."
Application Owner:
"Connection updated.
Starting application services."
PHASE 6: T+5 MINUTES (SMOKE TEST)
Commander:
"Run smoke tests now.
Application team, report status."
Application Owner:
"Application is accessible.
Basic functions are working."
Commander:
"SQL DBA, give me Database performance status."
SQL DBA:
"Database performance normal.
No errors in logs."
FAILURE BRANCH (APP FAILURE)
Application Owner:
"Users cannot log in.
Authentication errors detected."
SQL DBA:
"Likely orphaned users or missing logins.
Fixing now."
Commander:
"You have 10 minutes to resolve.
If unresolved, we rollback."
PHASE 7: T+15 MINUTES (VALIDATION)
Commander:
"Business validation in progress.
Application team, confirm data integrity."
Application Owner:
"Data appears consistent.
Key reports match expectations."
FAILURE BRANCH (DATA MISMATCH)
Application Owner:
"Data mismatch detected in financial totals."
Commander (NO DELAY)
"Stop.
We are initiating rollback.
Switch application back to source immediately."
ROLLBACK SCRIPT
Commander:
"Rollback initiated.
Application team, revert connection now."
SQL DBA:
"Source database set to read-write.
System restored."
PHASE 8: T+30 MINUTES (SUCCESS DECLARATION)
Commander:
"Migration successful.
System stable for 30 minutes.
Proceed to monitoring phase."
CRITICAL COMMANDER DECISION RULES
RULE 1: NEVER GUESS
If unsure:
"Pause all actions.
We need confirmation before proceeding."
RULE 2: DATA MORE IMPORTANT THAN SPEED
If risk:
"We are prioritizing data integrity over timeline."
RULE 3: CLEAR TIMEBOX
Every issue:
"You have 10 minutes to resolve.
After that, we rollback."
Most migrations don’t fail because of technology but because of confusion, poor communication, and no clear leader.
No comments:
Post a Comment