Wednesday, May 6, 2026

Live Migration Simulation War Room with Different Failure Scenarios

Live Migration Simulation War Room with Different Failure Scenarios

(Migrating From On-prem SQL Server To SQL Server on AWS EC2)


This is designed like a live migration war room scenarios, where things go wrong at specific minutes and you must react correctly.


Scenario Setup


* Database: 25TB (mission-critical financial system)


* Method: Backup + Restore + Log Shipping


* Downtime target: 20 minutes


* Migration window: Saturday 10:00 PM




SIMULATION MODE


We’ll simulate time exactly like a real migration.


You are the DBA. this article tell you what happens.


Your job: understand the correct response.



T-6 HOURS (4:00 PM)



Status: Everything looks good


* Full backup completed


* Restore started on EC2



FAILURE #1: RESTORE IS TOO SLOW


At T-4 hours, restore is only 40% complete.


What this means: You will miss your migration window.


Root Cause:


* Not enough IOPS on EC2


* Backup not split into enough files


Correct Response


Option A (Best if early enough)


* Stop restore


* Increase disk IOPS


* Restart restore


Option B (If too late)


Continue restore and:


* Delay migration window


* Inform stakeholders immediately


Wrong Move


* Ignoring the delay and hoping it finishes



T-60 MINUTES (9:00 PM)


Status: Restore completed, log shipping running


Everything seems fine.



FAILURE #2: LOG SHIPPING SILENTLY FAILED


You check logs:


```sql


SELECT * FROM msdb.dbo.restorehistory;

```


Last log applied = 30 minutes ago


What this means


EC2 is 30 minutes behind production


Root Cause:


* Copy job failed


* Network interruption


* Disk full on EC2



Correct Response:


Step 1: STOP CUTOVER PLAN


Do NOT proceed.



Step 2: Identify Missing Logs


Check source backup folder.



Step 3: Re-copy logs


```powershell


robocopy G:\Backup \\EC2-SQL\Backup *.trn

```


Step 4: Apply logs manually


```sql


RESTORE LOG YourDB FROM DISK = 'missing.trn' WITH NORECOVERY;

```



Step 5: Confirm sync is current



 Wrong Move:


* Proceeding with cutover while behind, that can cause data loss



T-15 MINUTES (9:45 PM)



Status: Sync restored, all good



FAILURE #3: ACTIVE TRANSACTIONS WON’T STOP


You try to freeze writes.


```sql


ALTER DATABASE YourDB SET READ_ONLY;

```


It hangs.



Root Cause:


* Long-running transactions still active



Correct Response:



Step 1: Identify blocking sessions


```sql


SELECT session_id, blocking_session_id

FROM sys.dm_exec_requests;

```



Step 2: Kill long transactions (carefully)


```sql


KILL <session_id>;

```



Step 3: Confirm zero active writes


Wrong Move:


* Forcing cutover without stopping transactions, which can cause inconsistent data.



T-5 MINUTES (9:55 PM)


Status: Ready for final log backup


FAILURE #4: FINAL LOG BACKUP FAILS


```sql


BACKUP LOG YourDB

```


Error:


```plaintext


BACKUP LOG cannot be performed because there is no current database backup

```


Root Cause: someone switched recovery model to SIMPLE earlier



CRITICAL SITUATION:


*You cannot continue log chain

*You cannot guarantee data consistency


Correct Response:


Option A (Best)


Abort migration, which means


* Re-enable application


* Switch DB back to READ_WRITE


Option B (If business forces migration)


* Take FULL backup immediately


```sql


BACKUP DATABASE YourDB TO DISK = 'emergency_full.bak';

```


* Restore on EC2 WITH RECOVERY


* Accept downtime with risk



Wrong Move:


* Ignoring error and continuing



T-0 (10:00 PM CUTOVER)


Status: You proceed successfully



FAILURE #5: APPLICATION FAILS


Users report:


* Login failures


* “User not found”


Root Cause:


* Missing SQL logins (SID mismatch)



Correct Response:


Fix orphan users


```sql


ALTER USER username WITH LOGIN = loginname;

```



Or recreate logins with SID


Wrong Move:


* Restarting SQL Server blindly




T+10 MINUTES



FAILURE #6: DATA MISMATCH FOUND


Finance team says: “Totals don’t match”



Root Cause:


* Final log not applied correctly


* Writes occurred during cutover


CRITICAL DECISION POINT



Correct Response:


If source DB still intact:


1. Take new log backup:


```sql


BACKUP LOG YourDB TO DISK = 'fix.trn';

```


2. Apply to EC2:


```sql


RESTORE LOG YourDB FROM DISK = 'fix.trn' WITH RECOVERY;

```



If not fixable: ROLLBACK



Wrong Move:


* Ignoring mismatch can become a business disaster



T+20 MINUTES


FAILURE #7: PERFORMANCE COLLAPSE


System is online but:


* Queries slow


* CPU 100%



Root Cause:


* Missing statistics


* Different execution plans


Correct Response:


```sql


EXEC sp_updatestats;

```


```sql


ALTER INDEX ALL ON YourTable REBUILD;

```



Wrong Move:


* Increasing instance size immediately without diagnosis



T+30 MINUTES


FINAL DECISION POINT



If everything stable: declare success



If major issue persists: execute rollback



FULL ROLLBACK SIMULATION


---


Step 1: Redirect application back


* Change DNS or connection string



Step 2: Enable writes on source


```sql


ALTER DATABASE YourDB SET READ_WRITE;

```



Step 3: Notify stakeholders



Step 4: Investigate root cause



LESSONS 



1. Most failures happen BEFORE cutover


* Restore speed


* Log sync



2. Most dangerous moment


* Final 10 minutes



3. Biggest risks


* Broken log chain


* Data inconsistency


* Missing logins



4. Golden rule


NEVER proceed if unsure


---


HOW TO PRACTICE 


You can practice this safely:



Lab Practice


1. Create test DB (100GB+)


2. Run migration


3. Introduce failures:


   * Delete a log file


   * Break log chain


   * Kill network


   * Change recovery model


Goal: Train your reaction, not just knowledge


---


The real DBAs is not judged by how smoothly things go but how well they recover when things break.

BUILDING MONITORING & ALERTING SYSTEM After Migrating From On-prem SQL Server To SQL Server on AWS EC2

 BUILDING PRODUCTION MONITORING & ALERTING SYSTEM After Migrating From On-prem SQL Server To SQL Server on AWS EC2



This guides to build a production-ready monitoring system that can:


* Detects problems early


* Alerts the right people


* Helps you diagnose fast


* Prevents outages



BIG PICTURE ARCHITECTURE


You are building 4 layers:


1. SQL Server Internal Monitoring (inside SQL Server)


2. OS + EC2 Monitoring(via AWS)


3. Central Logging & Alerting (SQL Agent + email + logs)


4. Response Automation(PowerShell + alerts)




WHAT YOU MUST MONITOR (CRITICAL SIGNALS)




DATABASE HEALTH:


* Database state (ONLINE/OFFLINE)


* Corruption


* Backup success/failure




PERFORMANCE:


* CPU usage


* Slow queries


* Blocking / deadlocks




STORAGE:


* Disk space


* IOPS latency




AVAILABILITY:


* Always On status


* Connection failures




PART 1: SQL SERVER MONITORING (CORE)




1.1 Create Monitoring Database


```sql 


CREATE DATABASE DBA_Monitoring;

GO

```


---


1.2 Table for Alerts


```sql 


USE DBA_Monitoring;


CREATE TABLE Alerts (

    AlertID INT IDENTITY(1,1),

    AlertType NVARCHAR(100),

    Message NVARCHAR(MAX),

    Severity INT,

    CreatedAt DATETIME DEFAULT GETDATE()

);

```




PART 2: CRITICAL MONITORING QUERIES




2.1 Check Database Status


```sql 


INSERT INTO DBA_Monitoring.dbo.Alerts (AlertType, Message, Severity)

SELECT 

    'Database Offline',

    name + ' is not ONLINE',

    1

FROM sys.databases

WHERE state_desc <> 'ONLINE';

```




2.2 Check Failed Backups


```sql 


INSERT INTO DBA_Monitoring.dbo.Alerts (AlertType, Message, Severity)

SELECT 

    'Backup Failure',

    'No backup in last 24 hours',

    1

WHERE NOT EXISTS (

    SELECT 1 FROM msdb.dbo.backupset

    WHERE backup_finish_date > DATEADD(HOUR, -24, GETDATE())

);

```




2.3 Check Disk Space


```sql 


EXEC xp_fixeddrives;

```


Alert if free space < 20%




2.4 Check Blocking


```sql 


SELECT blocking_session_id, session_id

FROM sys.dm_exec_requests

WHERE blocking_session_id <> 0;

```




2.5 Check Long Running Queries


```sql 


SELECT TOP 10

    total_elapsed_time / execution_count AS AvgTime,

    execution_count

FROM sys.dm_exec_query_stats

ORDER BY AvgTime DESC;

```




2.6 Deadlock Detection


```sql 


DBCC TRACEON (1222, -1);

```




PART 3: SQL AGENT ALERT JOB




3.1 Create Monitoring Job


```sql 


USE msdb;


EXEC sp_add_job @job_name = 'DBA_Monitoring_Job';

```


3.2 Add Step


```sql 


EXEC sp_add_jobstep

@job_name = 'DBA_Monitoring_Job',

@step_name = 'Run Checks',

@subsystem = 'TSQL',

@command = '

EXEC check_script for Database Status;

EXEC check_script for Backup Failure;

';

```


3.3 Schedule Every 5 Minutes


```sql 


EXEC sp_add_schedule

@schedule_name = 'Every5Min',

@freq_type = 4,

@freq_subday_type = 4,

@freq_subday_interval = 5;

```




PART 4: EMAIL ALERTING SYSTEM


4.1 Configure Database Mail


```sql 


EXEC sp_configure 'Database Mail XPs', 1;

RECONFIGURE;

```


4.2 Send Alert Email


```sql 


EXEC msdb.dbo.sp_send_dbmail

@profile_name = 'DBA_Profile',

@recipients = 'dba@company.com',

@subject = 'SQL Alert',

@body = 'Critical issue detected';

```


4.3 Auto Email on Alert


```sql 


IF EXISTS (SELECT 1 FROM DBA_Monitoring.dbo.Alerts WHERE Severity = 1)

BEGIN

    EXEC msdb.dbo.sp_send_dbmail

    @profile_name = 'DBA_Profile',

    @recipients = 'dba@company.com',

    @subject = 'CRITICAL SQL ALERT',

    @body = 'Check DBA_Monitoring database immediately';

END

```




PART 5: AWS CLOUDWATCH MONITORING


5.1 Key Metrics to Enable


Monitor in AWS:


* CPUUtilization

* DiskReadOps / WriteOps

* NetworkIn / Out


5.2 CloudWatch Alarm Example


Trigger alert if:


* CPU > 80% for 5 minutes


5.3 Disk Alert


Trigger if:


* Free space < 15%




PART 6: POWERSHELL ALERT AUTOMATION


6.1 Check Disk Space Script


```powershell 


$drives = Get-PSDrive -PSProvider FileSystem


foreach ($drive in $drives) {

    if ($drive.Free -lt 20GB) {

        Write-Host "Low disk space on $($drive.Name)"

    }

}

```


6.2 Send Alert


```powershell 


Send-MailMessage -To "dba@company.com" `

-From "alert@company.com" `

-Subject "Disk Alert" `

-Body "Low disk space detected" `

-SmtpServer "smtp.server.com"

```




PART 7: ALWAYS ON MONITORING


7.1 Check Replica Health


```sql 


SELECT 

    replica_server_name,

    synchronization_state_desc

FROM sys.dm_hadr_database_replica_states;

```


Alert if: NOT SYNCHRONIZED




PART 8: ALERT SEVERITY MODEL


Severity Levels


* 1 (Critical) → Immediate action (DB down, corruption)


* 2 (Warning) → Investigate (slow queries)


* 3 (Info) → Logging only




ART 9: RESPONSE AUTOMATION


Example: Restart Failed Job


```powershell 


Invoke-Sqlcmd -Query "EXEC msdb.dbo.sp_start_job 'JobName'"

```


Example: Kill Blocking Session


```sql 


KILL <session_id>;

```




PART 10: DAILY DBA DASHBOARD QUERY



```sql 


SELECT TOP 50 *

FROM DBA_Monitoring.dbo.Alerts

ORDER BY CreatedAt DESC;

```


PRODUCTION CHECKLIST


You are production-ready if:



SQL Level


* Monitoring job running


* Alerts being logged


* Emails working


AWS Level


* CloudWatch alarms active

* Disk + CPU monitored



Response Level


* Scripts tested


* Alerts actionable


---


Most outages happen because there is No alert or Alert ignored or Alert unclear.


War Room Script During The Actual Migration From On-prem SQL Server To SQL Server on AWS EC2

War Room Script During The Actual Migration From On-prem SQL Server To SQL Server on AWS EC2


This separates a “technical person” from someone who can actually run a high-risk production migration. The template includes:


* who speaks


* who acts


* what to say /write


*when to pause, escalate, or rollback


With this script, you can:


* Lead a real production migration


* Speak with authority and clarity


* Handle failures without panic



Scenario:


* Migration: On-prem SQL Server → SQL Server on EC2


* Database: 10–100TB


* Downtime target: 20 minutes


* Environment: Mission-critical (finance / production)


WAR ROOM ROLES with CLEAR RESPONSIBILITIES


You must assign these BEFORE starting.


1. Migration Commander (YOU / LEAD DBA)


* Final decision maker


* Controls timeline


* Approves cutover / rollback



2. SQL Server Execution DBA


* Runs SQL scripts


* Monitors backup/restore/logs


3. Infrastructure Engineer


* EC2, storage, network


* Handles disk / IOPS / connectivity


4. Application Owner


* Validates application


* Confirms business functionality


5. Observer / Scriber


* Writes down:


  * timestamps


  * issues


  * decisions



WAR ROOM RULES (READ THIS OUT LOUD / COMNICATE CLEARLY AT START)


*Opening Statemen by Migration Commander


"This is a controlled production migration. Only the assigned person speaks during each step. No action is taken without confirmation. If any critical issue occurs, we pause immediately. If data consistency is at risk, we rollback without hesitation."



PHASE 1: T-30 MINUTES (PRE-CUTOVER)


Migration Commander:


"We are at T minus 30 minutes.

Confirm all systems are ready.

SQL DBA, give me sync status."


SQL DBA:


"Log shipping is active.

Last log applied is within 1 minute.

No errors detected."


Commander to Infra


"Infrastructure status?"



Infra Engineer:


"EC2 is stable.

Disk usage is within limits.

No network issues."


Commander Decision:


"Proceed to pre-cutover validation."



PHASE 2: T-15 MINUTES (PREPARE FREEZE)


Commander:


"Application team, prepare to stop writes.

Confirm when ready."


Application Owner:


"Application ready for write freeze.

No critical jobs running."


PHASE 3: T-10 MINUTES (WRITE FREEZE)



Commander:


"Execute write freeze now."


SQL DBA (after running):


"Database set to read-only.

Checking active transactions."


SQL DBA (confirmation):


"No active transactions remaining."



IF FAILURE (TRANSACTIONS NOT STOPPING)


SQL DBA:


"We have active blocking transactions.

Requesting permission to terminate sessions."


Commander:


"Approved. Terminate only long-running sessions.

Proceed carefully."



PHASE 4: T-5 MINUTES (FINAL SYNC)



Commander:


"Proceed with final log backup."


SQL DBA:


"Final log backup started."



SQL DBA (after backup):


"Final log backup completed.

Copying to EC2."


SQL DBA:


"Final log restored with recovery.

Target database is online."



FAILURE BRANCH (CRITICAL)



If final log fails



SQL DBA:


"Final log backup failed.

Log chain may be broken."



Commander (STRICT RESPONSE):


"Stop all cutover activities.

We are aborting migration.

Re-enable source database immediately."



PHASE 5: T-0 (CUTOVER)



Commander:


"We are at cutover point.

Switch application connections to EC2."



Application Owner:


"Connection updated.

Starting application services."



PHASE 6: T+5 MINUTES (SMOKE TEST)




Commander:


"Run smoke tests now.

Application team, report status."


Application Owner:


"Application is accessible.

Basic functions are working."



Commander:


"SQL DBA, give me Database performance status."



SQL DBA:


"Database performance normal.

No errors in logs."



FAILURE BRANCH (APP FAILURE)


Application Owner:


"Users cannot log in.

Authentication errors detected."


SQL DBA:


"Likely orphaned users or missing logins.

Fixing now."


Commander:


"You have 10 minutes to resolve.

If unresolved, we rollback."




PHASE 7: T+15 MINUTES (VALIDATION)




Commander:


"Business validation in progress.

Application team, confirm data integrity."




Application Owner:


"Data appears consistent.

Key reports match expectations."




FAILURE BRANCH (DATA MISMATCH)




Application Owner:


"Data mismatch detected in financial totals."




Commander (NO DELAY)


"Stop.

We are initiating rollback.

Switch application back to source immediately."




ROLLBACK SCRIPT




Commander:


"Rollback initiated.

Application team, revert connection now."




SQL DBA:


"Source database set to read-write.

System restored."




PHASE 8: T+30 MINUTES (SUCCESS DECLARATION)




Commander:


"Migration successful.

System stable for 30 minutes.

Proceed to monitoring phase."




CRITICAL COMMANDER DECISION RULES




RULE 1: NEVER GUESS


If unsure:


"Pause all actions.

We need confirmation before proceeding."




RULE 2: DATA MORE IMPORTANT THAN SPEED


If risk:


"We are prioritizing data integrity over timeline."




RULE 3: CLEAR TIMEBOX


Every issue:


"You have 10 minutes to resolve.

After that, we rollback."



Most migrations don’t fail because of technology but because of confusion,  poor communication, and no clear leader.

Tuesday, May 5, 2026

USE OF AUTOMATION IN THE MIGRATING OnPrem SQL Server to AWS EC2 With VLDBs

 USE OF AUTOMATION IN THE MIGRATING OnPrem SQL Server to AWS EC2 With VLDBs


:


* PowerShell → orchestration, file movement, scheduling


* SQL scripts → backup, restore, validation, HADR



Automate:


1. Full backup (multi-file, compressed)

2. Transaction log backups (loop)

3. File transfer (robocopy)

4. Restore (full + logs)

5. Continuous sync loop

6. Final cutover

7. Post-migration validation

8. Optional HADR setup (Always On)



BEFORE YOU RUN ANYTHING, Please Customize/Edit these variables everywhere:


```Sample powershell Script


$SourceServer = "ONPREM-SQL"

$TargetServer = "EC2-SQL"

$Database = "YourDB"

$BackupPath = "G:\Backup"

$TargetBackupPath = "\\EC2-SQL\Backup"

$DataPath = "D:\Data"

$LogPath = "E:\Logs"

```



PART 1: FULL BACKUP AUTOMATION (SQL)


Run on source (on-prem SQL Server)



1.1 Full Backup Script (Multi-file + Compression)


```Sample SQL Script


DECLARE @DBName NVARCHAR(100) = 'YourDB';

DECLARE @Path NVARCHAR(200) = 'G:\Backup\';

DECLARE @SQL NVARCHAR(MAX);


SET @SQL = '

BACKUP DATABASE [' + @DBName + ']

TO 

DISK = ''' + @Path + @DBName + '_1.bak'',

DISK = ''' + @Path + @DBName + '_2.bak'',

DISK = ''' + @Path + @DBName + '_3.bak'',

DISK = ''' + @Path + @DBName + '_4.bak''

WITH COMPRESSION, STATS = 5;

';


EXEC(@SQL);

```


PART 2: FILE TRANSFER AUTOMATION (PowerShell)


Run on **source server**



2.1 Copy Backup Files to EC2


```Sample powershell Script


$Source = "G:\Backup"

$Destination = "\\EC2-SQL\Backup"


robocopy $Source $Destination *.bak /E /Z /MT:16 /R:3 /W:5 /LOG:C:\copy.log

```



PART 3: RESTORE FULL BACKUP (SQL)


Run on target (EC2 SQL Server)



3.1 Restore with NORECOVERY


```Sample SQL  Script


RESTORE DATABASE [YourDB]

FROM 

DISK = 'G:\Backup\YourDB_1.bak',

DISK = 'G:\Backup\YourDB_2.bak',

DISK = 'G:\Backup\YourDB_3.bak',

DISK = 'G:\Backup\YourDB_4.bak'

WITH 

MOVE 'YourDB_Data' TO 'D:\Data\YourDB.mdf',

MOVE 'YourDB_Log' TO 'E:\Logs\YourDB.ldf',

NORECOVERY,

STATS = 5;

```


PART 4: LOG SHIPPING AUTOMATION (CONTINUOUS SYNC)


This is the core automation loop



4.1 SQL: Transaction Log Backup (Source)


```Sample SQL Script 


DECLARE @DBName NVARCHAR(100) = 'YourDB';

DECLARE @Path NVARCHAR(200) = 'G:\Backup\';

DECLARE @FileName NVARCHAR(200);


SET @FileName = @Path + @DBName + '_log_' 

    + REPLACE(CONVERT(VARCHAR, GETDATE(), 120), ':', '-') + '.trn';


BACKUP LOG [YourDB]

TO DISK = @FileName

WITH COMPRESSION, STATS = 5;

```



4.2 PowerShell: Continuous Copy + Restore Loop


Run this on a utility server or EC2


```Sample powershell Script


while ($true)

{

    Write-Host "Starting sync cycle..."


    # Step 1: Backup log (remote execution)


    Invoke-Sqlcmd -ServerInstance $SourceServer -Query "

    BACKUP LOG [$Database]

    TO DISK = '$BackupPath\$Database\_log.trn'

    WITH COMPRESSION;

    "


    # Step 2: Copy logs

---Sample powershell Script

    robocopy $BackupPath $TargetBackupPath *.trn /MT:16 /R:2 /W:2


    # Step 3: Restore logs

---Sample powershell Script


    $files = Get-ChildItem "$TargetBackupPath\*.trn"


    foreach ($file in $files)

    {

        $query = "

        RESTORE LOG [$Database]

        FROM DISK = '$($file.FullName)'

        WITH NORECOVERY;

        "


        Invoke-Sqlcmd -ServerInstance $TargetServer -Query $query


        # Move processed file


        Move-Item $file.FullName "$TargetBackupPath\processed\" -Force

    }


    Start-Sleep -Seconds 300  # 5 minutes

}

```



PART 5: FINAL CUTOVER AUTOMATION


5.1 Final Log Backup (Source)


```Sample SQL Script


BACKUP LOG [YourDB]

TO DISK = 'G:\Backup\YourDB_final.trn'

WITH NORECOVERY;

```



5.2 PowerShell: Final Copy + Restore


```Sample powershell Script


# Copy final log

---Sample powershell Script

robocopy $BackupPath $TargetBackupPath YourDB_final.trn


# Restore final log WITH RECOVERY

---Sample powershell Script

Invoke-Sqlcmd -ServerInstance $TargetServer -Query "

RESTORE LOG [$Database]

FROM DISK = '$TargetBackupPath\YourDB_final.trn'

WITH RECOVERY;

"

```



PART 6: POST-MIGRATION VALIDATION AUTOMATION



6.1 Check Database Status


```Sample powershell Script


Invoke-Sqlcmd -ServerInstance $TargetServer -Query "

SELECT name, state_desc FROM sys.databases WHERE name = '$Database';

"

```



6.2 Run Integrity Check


```Sample powershell Script


Invoke-Sqlcmd -ServerInstance $TargetServer -Query "

DBCC CHECKDB('$Database') WITH NO_INFOMSGS;

"

```



6.3 Update Statistics


```Sample powershell Script


Invoke-Sqlcmd -ServerInstance $TargetServer -Query "

USE [$Database];

EXEC sp_updatestats;

"

```



PART 7: AUTOMATED BACKUP JOB (POST-MIGRATION)



7.1 SQL Agent Job Script


```Sample SQL Script


USE msdb;


EXEC sp_add_job

@job_name = 'FullBackupJob';


EXEC sp_add_jobstep

@job_name = 'FullBackupJob',

@step_name = 'BackupStep',

@subsystem = 'TSQL',

@command = '

BACKUP DATABASE YourDB

TO DISK = ''G:\Backup\YourDB_full.bak''

WITH COMPRESSION;

';


EXEC sp_add_schedule

@schedule_name = 'DailySchedule',

@freq_type = 4,

@freq_interval = 1;


EXEC sp_attach_schedule

@job_name = 'FullBackupJob',

@schedule_name = 'DailySchedule';


EXEC sp_add_jobserver

@job_name = 'FullBackupJob';

```


PART 8: ALWAYS ON AUTOMATION (HADR)



8.1 Enable Always On


```Sample SQL Script


EXEC sp_configure 'show advanced options', 1;

RECONFIGURE;

EXEC sp_configure 'Always On Availability Groups', 1;

RECONFIGURE;

```



8.2 Create Availability Group


```Sample SQL Script


CREATE AVAILABILITY GROUP YourAG

FOR DATABASE YourDB

REPLICA ON 

'N1' WITH (

    ENDPOINT_URL = 'TCP://N1:5022',

    AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

    FAILOVER_MODE = AUTOMATIC

),

'N2' WITH (

    ENDPOINT_URL = 'TCP://N2:5022',

    AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

    FAILOVER_MODE = AUTOMATIC

);

```



PART 9: ERROR HANDLING (IMPORTANT)


Add logging in PowerShell:


```Sample powershell Script


try {

    Invoke-Sqlcmd -ServerInstance $TargetServer -Query "SELECT 1"

}

catch {

    Write-Host "Error occurred: $_"

    Out-File -FilePath "C:\error.log" -Append

}

```



PART 10: FULL AUTOMATION FLOW 


Here’s the mental automation pipeline:



STEP 1: Run full backup (SQL)


STEP 2: Copy backup (PowerShell)



STEP 3: Restore (SQL)



STEP 4: Start log loop (PowerShell + SQL)



STEP 5: Final log + recovery



STEP 6: Switch application



STEP 7: Run validation + monitoring


POST-MIGRATION of OnPrem SQL Server to AWS EC2 With VLDBs

 POST-MIGRATION of OnPrem SQL Server to AWS EC2 With VLDBs


Think of this phase as:


“Make the system stable, fast, resilient, and production-ready”


We will go step by step in a logical sequence, just like a real DBA would operate in the first days and weeks after migration.


1. Immediate Post-Migration Validation (First 1–2 Hours)


Before doing anything advanced, confirm the system is correct and usable.



1.1 Verify Database State



SELECT name, state_desc 

FROM sys.databases;

```


Expected Result: `ONLINE`



1.2 Run DBCC CHECKDB (Integrity Check)


This is non-negotiable for mission-critical systems.



DBCC CHECKDB ('YourDB') WITH NO_INFOMSGS, ALL_ERRORMSGS;

```


1.3 Validate Data Consistency


Check:


* Row counts

* Key business tables

* Financial totals (if applicable)


Example:



SELECT COUNT(*) FROM CriticalTable;

```



1.4 Validate Application Connectivity


* Application connects successfully

* No login failures

* No timeout errors



2. Fix Common Post-Migration Issues


---


2.1 Orphan Users



ALTER USER username WITH LOGIN = loginname;

```



2.2 Missing SQL Agent Jobs


Recreate jobs:



USE msdb;

SELECT name FROM sysjobs;

```


2.3 Fix Broken Linked Servers



EXEC sp_addlinkedserver 

    @server = 'LinkedServerName',

    @srvproduct = '',

    @provider = 'SQLNCLI',

    @datasrc = 'RemoteServer';

```



3. Configure BACKUP STRATEGY (CRITICAL)


If you skip this, you are running blind.



3.1 Full Backup Strategy



BACKUP DATABASE YourDB

TO DISK = 'G:\Backup\YourDB_full.bak'

WITH COMPRESSION;

```


Frequency:


* Daily (large DBs: weekly full + differential)



3.2 Differential Backup



BACKUP DATABASE YourDB

TO DISK = 'G:\Backup\YourDB_diff.bak'

WITH DIFFERENTIAL;

```



3.3 Transaction Log Backup



BACKUP LOG YourDB

TO DISK = 'G:\Backup\YourDB_log.trn';

```


Frequency:


* Every 5–15 minutes



3.4 Store Backups in AWS S3


Best practice:


* Copy backups to S3 for durability

* Use lifecycle policies



4. Setup High Availability (HADR)


Now we make the system resilient.



OPTION 1: ALWAYS ON AVAILABILITY GROUPS (BEST PRACTICE)



4.1 Architecture


* Primary replica → EC2 instance 1

* Secondary replica → EC2 instance 2

* Optional → third replica



4.2 Enable Always On



EXEC sp_configure 'show advanced options', 1;

RECONFIGURE;

EXEC sp_configure 'Always On Availability Groups', 1;

RECONFIGURE;

```



4.3 Create Availability Group



CREATE AVAILABILITY GROUP YourAG

FOR DATABASE YourDB

REPLICA ON 

    'PrimaryServer' WITH (

        ENDPOINT_URL = 'TCP://PrimaryServer:5022',

        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

        FAILOVER_MODE = AUTOMATIC

    ),

    'SecondaryServer' WITH (

        ENDPOINT_URL = 'TCP://SecondaryServer:5022',

        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

        FAILOVER_MODE = AUTOMATIC

    );

```



4.4 Join Secondary Replica



ALTER DATABASE YourDB 

SET HADR AVAILABILITY GROUP = YourAG;

```


4.5 Test Failover



ALTER AVAILABILITY GROUP YourAG FAILOVER;

```


OPTION 2: LOG SHIPPING (Simpler DR)



Setup Steps:


1. Backup logs from primary

2. Copy to secondary

3. Restore with NORECOVERY



OPTION 3: AWS Multi-AZ (Manual Design on EC2)


* Use multiple EC2 instances

* Combine with Always On



5. Performance Tuning (VERY IMPORTANT)


After migration, performance often changes.



5.1 Update Statistics



EXEC sp_updatestats;

```



5.2 Rebuild Indexes



ALTER INDEX ALL ON YourTable REBUILD;

```



5.3 Check Slow Queries



SELECT TOP 10

    total_elapsed_time/execution_count AS AvgTime,

    execution_count

FROM sys.dm_exec_query_stats

ORDER BY AvgTime DESC;

```



5.4 Configure MAXDOP



EXEC sp_configure 'max degree of parallelism', 8;

RECONFIGURE;

```



5.5 Optimize TempDB


* Multiple data files

* Equal size



6. Monitoring and Alerting



6.1 Use AWS CloudWatch


Monitor:


* CPU

* Disk IOPS

* Network



6.2 SQL Server Monitoring Queries



SELECT 

    wait_type, 

    wait_time_ms

FROM sys.dm_os_wait_stats

ORDER BY wait_time_ms DESC;

```


6.3 Enable Alerts


* Disk full

* Job failure

* High CPU



7. Security Hardening



7.1 Disable Unused Accounts



ALTER LOGIN username DISABLE;

```



7.2 Enforce Encryption


* Use SSL connections

* Enable TDE if required



7.3 Restrict Ports


Only allow:


* 1433 (SQL)

* 3389 (RDP, restricted)



8. Cost Optimization (AWS-Specific)



8.1 Right-Size EC2


* Avoid over-provisioning

* Monitor usage



8.2 Optimize Storage


* Use gp3 if possible

* Reduce unused volumes



8.3 Stop Non-Production Servers



9. Disaster Recovery Planning



9.1 Cross-Region Backup


* Copy backups to another AWS region



9.2 Test Restore Regularly


Never assume backups work—test them.



9.3 Define RPO and RTO


* RPO → Data loss tolerance

* RTO → Recovery time



10. Real DBA Daily Operations


After migration, your daily job includes:



Daily Tasks


* Check backups

* Check failed jobs

* Monitor performance



Weekly Tasks


* Index maintenance

* Review slow queries



Monthly Tasks


* Capacity planning

* Cost review



11. Troubleshooting Common Issues




## Issue: High CPU


Fix:


* Identify query

* Optimize index


---


## Issue: Slow Disk


Fix:


* Increase IOPS

* Move to faster storage


---


## Issue: Blocking



SELECT * FROM sys.dm_exec_requests;

```


---


## Issue: Deadlocks


Enable trace:



DBCC TRACEON (1222, -1);

```


---


12. Final Mental Model (Production Readiness)


You are production-ready if:


* Backups are running

* HA is configured

* Monitoring is active

* Performance is stable

* DR is tested


---

SUMMARY


BEFORE MIGRATION:


* Inventory complete

* AWS ready

* Backup tested

* Migration plan documented


---


DURING MIGRATION:


* Full restore completed

* Logs continuously applied

* Final cutover executed

* Application switched


---


AFTER MIGRATION:


* DB validated

* Backups configured

* HA implemented

* Performance tuned

* Monitoring active


Sunday, May 3, 2026

EXECUTION PHASE OF MIGRATING OnPrem SQL Server to AWS EC2 With VLDBs

EXECUTION PHASE OF MIGRATING OnPrem SQL Server to AWS EC2 With VLDBs



Think of this phase as:


“Move data safely, keep systems consistent, and switch with minimal downtime”



1. Choose the Execution Flow 


For large, mission-critical databases, the safest pattern is:


Hybrid Migration Flow


1. Full Backup → Restore on EC2


2. Continuous Sync → Log Shipping or AWS DMS


3. Final Cutover → Apply last logs + switch application


This avoids long downtime.



2. Step 1: Take Initial Full Backup (SOURCE SERVER)


This is your starting point.


Script: Full Backup (Multi-file for Speed)



BACKUP DATABASE YourDB

TO DISK = 'G:\Backup\YourDB_1.bak',

   DISK = 'G:\Backup\YourDB_2.bak',

   DISK = 'G:\Backup\YourDB_3.bak',

   DISK = 'G:\Backup\YourDB_4.bak'

WITH 

    COMPRESSION,

    STATS = 10;

```


Why this matters:


* Parallel writes → faster backup


* Compression → smaller transfer size


* Essential for 10–100TB DBs



3. Step 2: Transfer Backup to AWS EC2


Now you must move huge data safely.



Option A: Fast Network Transfer


Use:


* SMB copy


* Robocopy


Example:


```

robocopy G:\Backup \\EC2-IP\Backup /E /Z /MT:16

```


Option B: AWS DataSync (Recommended for Large Data)


* Automates transfer


* Handles retries


* Optimized for large datasets



Option C: AWS Snowball (Offline Transfer)


If bandwidth is slow:


* Copy data to device


* Ship to AWS


4. Step 3: Restore Database on EC2 (TARGET SERVER)


Now restore the database.



Script: Restore with NORECOVERY



RESTORE DATABASE YourDB

FROM DISK = 'G:\Backup\YourDB_1.bak',

     DISK = 'G:\Backup\YourDB_2.bak',

     DISK = 'G:\Backup\YourDB_3.bak',

     DISK = 'G:\Backup\YourDB_4.bak'

WITH 

    MOVE 'YourDB_Data' TO 'D:\Data\YourDB.mdf',

    MOVE 'YourDB_Log' TO 'E:\Logs\YourDB.ldf',

    NORECOVERY,

    STATS = 10;

```


Important:


* `NORECOVERY` keeps DB in restoring mode


* Required for applying logs later



5. Step 4: Start Continuous Synchronization


Now we keep EC2 updated while production is still running.



OPTION 1: LOG SHIPPING (Most Reliable)



5.1 Take Transaction Log Backups (Source)



BACKUP LOG YourDB

TO DISK = 'G:\Backup\YourDB_log_1.trn'

WITH COMPRESSION;

```




5.2 Copy Logs to EC2


Use:



robocopy G:\Backup \\EC2-IP\Backup *.trn /MT:16

```



5.3 Restore Logs on EC2



RESTORE LOG YourDB

FROM DISK = 'G:\Backup\YourDB_log_1.trn'

WITH NORECOVERY;

```



Automate This Loop


Repeat every:


* 5 minutes (high critical)


* 15 minutes (normal)



OPTION 2: AWS Database Migration Service (DMS)



When to Use DMS


* Near-zero downtime needed

* Mixed workloads

* Continuous replication



Steps:


1. Create DMS replication instance

2. Define source (on-prem SQL Server)

3. Define target (EC2 SQL Server)

4. Enable CDC (Change Data Capture)

5. Start replication


Key Setting:


* Full load + ongoing replication



6. Monitor Synchronization


You must confirm:


* No missing logs

* No delays

* No failures



Check Restore Status



SELECT 

    database_id,

    state_desc

FROM sys.databases

WHERE name = 'YourDB';

```


7. Performance Considerations During Sync


7.1 Avoid Network Bottlenecks


* Use enhanced networking

* Monitor bandwidth


7.2 Monitor Disk Throughput


* Use CloudWatch (AWS)

* Ensure IOPS is sufficient



7.3 Monitor SQL Server Load



SELECT 

    cpu_count, 

    scheduler_count 

FROM sys.dm_os_sys_info;

```


8. Pre-Cutover Validation (VERY IMPORTANT)


Before switching, verify:



8.1 Row Counts



SELECT COUNT(*) FROM YourTable;

```


Compare source vs target.



8.2 DBCC CHECKDB



DBCC CHECKDB('YourDB');

```



8.3 Check Users


Fix orphan users:



EXEC sp_change_users_login 'Auto_Fix', 'username';

```


9. FINAL CUTOVER (Critical Moment)


This is where downtime happens—but we minimize it.



Step-by-Step Cutover


9.1 Announce Downtime


Inform users:


* Application will be offline

* Freeze transactions


9.2 Stop Application Writes


* Disable app connections

* Or set DB to read-only


9.3 Take Final Log Backup



BACKUP LOG YourDB

TO DISK = 'G:\Backup\YourDB_final.trn'

WITH NORECOVERY;

```


9.4 Transfer Final Log


Copy to EC2.



9.5 Apply Final Log



RESTORE LOG YourDB

FROM DISK = 'G:\Backup\YourDB_final.trn'

WITH RECOVERY;

```


Important:


* `WITH RECOVERY` brings DB online



10. Switch Application to EC2


Update:


* Connection strings

* DNS

* Application configs


11. Post-Cutover Smoke Test


Immediately test:


* Login works

* Queries run

* Data is correct


12. Common Problems (And Fixes)



Problem 1: Slow Restore


Fix:


* Use multiple files

* Increase IOPS



Problem 2: Missing Log Files


Fix:


* Re-copy logs

* Maintain sequence



Problem 3: Orphan Users


Fix:



ALTER USER username WITH LOGIN = loginname;

```



Problem 4: Performance Drop


Fix:


* Update statistics



EXEC sp_updatestats;

```


13. Real Downtime Optimization Tips



Reduce Downtime by:


* Frequent log backups

* Smaller final log

* Pre-tested scripts

* Automation



14. During Migration Checklist (Mental Model)


Before moving to next stage, confirm:


* Full backup restored successfully

* Logs continuously applied

* No data loss

* Final log applied successfully

* Application switched

* Users validated system


Saturday, May 2, 2026

FOUNDATION PHASE FOR MIGRATING OnPrem SQL Server to AWS EC2 With VLDBs

FOUNDATION PHASE FOR MIGRATING OnPrem SQL Server to AWS EC2 With VLDBs


This is the most important part.

If you rush here, everything later becomes painful.


Think of this phase as: “Measure twice, cut once”



1. Understand What You Are Migrating


Before touching AWS, you must deeply understand your current SQL Server environment.


1.1 Inventory Everything


You need a full inventory of:


* SQL Server versions (2012, 2016, 2019, etc.)


* Edition (Standard, Enterprise)


* Database sizes (10TB, 50TB, 100TB)


* Number of databases


* Dependencies (apps, jobs, ETL, reports)


* Linked servers


* SQL Agent jobs


* SSIS / SSRS / SSAS usage



Script: Get Database Sizes



SELECT 

    name AS DatabaseName,

    size * 8 / 1024 AS SizeMB

FROM sys.master_files

WHERE type = 0;

```



1.2 Identify Critical Databases


Not all databases are equal.


Classify them:


* Tier 1 → Mission-critical (financial, production)


* Tier 2 → Important


* Tier 3 → Non-critical


Focus most effort on Tier 1 (your 10–100TB databases)



1.3 Understand Workload


Ask:


* Peak hours?


* Transactions per second?


* Read vs Write ratio?


* Heavy queries?


Script: Top Queries



SELECT TOP 10

    total_worker_time/execution_count AS AvgCPU,

    execution_count,

    query_hash

FROM sys.dm_exec_query_stats

ORDER BY AvgCPU DESC;

```



2. Choose Migration Strategy


There is no single method. You must choose wisely.


2.1 Migration Types


Option 1: Backup and Restore (Most Common)


* Simple


* Reliable


* Best for large databases


* Requires downtime


Option 2: Log Shipping (Minimal Downtime)


* Continuous sync


* Good for large DBs


* Manual failover


Option 3: AWS DMS (Database Migration Service)


* Continuous replication


* Good for near-zero downtime


* Free tier available (limited)


Option 4: Always On Availability Groups (Advanced)


* Near zero downtime


* Complex


* Best for mission-critical



2.2 Recommendation for 10–100TB


Use Hybrid Approach:


* Initial load → Backup/Restore


* Sync → Log Shipping or DMS


* Cutover → Final log restore



3. Prepare AWS Environment


Now we move into AWS.



3.1 Create EC2 Instance for SQL Server



Key Decisions:


* Instance Type:


  * Memory optimized → r5, r6i


* Storage:


  * Use EBS io2 or gp3


* Network:


  * Place in VPC



3.2 Storage Design (VERY IMPORTANT)


For large databases:


* Data files → separate volume


* Log files → separate volume


* TempDB → separate volume


Best Practice Layout:


* C:\ → OS

* D:\ → Data

* E:\ → Logs

* F:\ → TempDB

* G:\ → Backups



3.3 Enable Enhanced Networking


Improves throughput for large data transfers.



4. Install SQL Server on EC2


Steps:


1. Launch Windows Server EC2


2. Install SQL Server (same or higher version)


3. Configure:


   * Max memory

   * TempDB

   * Parallelism



Script: Set Max Memory



EXEC sp_configure 'max server memory', 50000;

RECONFIGURE;

```



5. Use Free AWS Migration Tools


Here are the most important ones:



5.1 AWS Schema Conversion Tool (SCT)


Use this to:


* Analyze compatibility


* Detect issues before migration



5.2 AWS Database Migration Service (DMS)


Use this for:


* Continuous replication


* Minimal downtime migration



5.3 AWS DataSync


Useful for:


* Moving large backup files (10TB+)



5.4 AWS Snowball (Offline Transfer)


If internet is slow:


* AWS sends a device


* You copy data


* Ship it back



6. Network Planning


Migration fails without good networking.



6.1 Connectivity Options


* VPN 


* Direct Connect (best for large data)



6.2 Open Required Ports


* SQL Server → 1433


* RDP → 3389



7. Backup Strategy Before Migration


You must take full backups before touching anything



7.1 Full Backup



BACKUP DATABASE YourDB

TO DISK = 'G:\Backup\YourDB_full.bak'

WITH COMPRESSION;

```



7.2 Transaction Log Backup



BACKUP LOG YourDB

TO DISK = 'G:\Backup\YourDB_log.trn';

```



7.3 Verify Backup



RESTORE VERIFYONLY

FROM DISK = 'G:\Backup\YourDB_full.bak';

```



8. Security Preparation



8.1 Script Logins



SELECT 

    name, 

    sid 

FROM sys.sql_logins;

```


Use Microsoft script to transfer logins with passwords.



8.2 Permissions


Script:


* Users


* Roles


* Permissions



9. Prepare for Large Database Migration (10–100TB)


This is where things get serious.



9.1 Split Backup Files



BACKUP DATABASE YourDB

TO DISK = 'G:\Backup\YourDB_1.bak',

   DISK = 'G:\Backup\YourDB_2.bak',

   DISK = 'G:\Backup\YourDB_3.bak'

WITH COMPRESSION;

```


This improves speed.



9.2 Use Compression


Reduces size significantly.



9.3 Test Restore Time


Always test restore on EC2 before real migration.



10. Dry Run (VERY IMPORTANT)


Never migrate directly.


Do a full test:


1. Backup

2. Transfer

3. Restore

4. Validate


11. Create Migration Runbook


Write down:


* Step order

* Commands

* Downtime window

* Rollback plan



12. Rollback Plan


Always prepare failure plan:


* Keep original DB untouched

* Be ready to switch back


13. Pre-Migration Checklist (Mental Model)


Before moving to nextstepyou should be able to answer:


* Do I know all databases and sizes?


* Did I test backup and restore?


* Is AWS environment ready?


* Is SQL Server configured?


* Do I have login scripts?


* Do I have rollback plan?



Live Migration Simulation War Room with Different Failure Scenarios

Live Migration Simulation War Room with Different Failure Scenarios ( Migrating From On-prem SQL Server To SQL Server on AWS EC2 ) This is d...