Wednesday, May 6, 2026

BUILDING MONITORING & ALERTING SYSTEM After Migrating From On-prem SQL Server To SQL Server on AWS EC2

 BUILDING PRODUCTION MONITORING & ALERTING SYSTEM After Migrating From On-prem SQL Server To SQL Server on AWS EC2



This guides to build a production-ready monitoring system that can:


* Detects problems early


* Alerts the right people


* Helps you diagnose fast


* Prevents outages



BIG PICTURE ARCHITECTURE


You are building 4 layers:


1. SQL Server Internal Monitoring (inside SQL Server)


2. OS + EC2 Monitoring(via AWS)


3. Central Logging & Alerting (SQL Agent + email + logs)


4. Response Automation(PowerShell + alerts)




WHAT YOU MUST MONITOR (CRITICAL SIGNALS)




DATABASE HEALTH:


* Database state (ONLINE/OFFLINE)


* Corruption


* Backup success/failure




PERFORMANCE:


* CPU usage


* Slow queries


* Blocking / deadlocks




STORAGE:


* Disk space


* IOPS latency




AVAILABILITY:


* Always On status


* Connection failures




PART 1: SQL SERVER MONITORING (CORE)




1.1 Create Monitoring Database


```sql 


CREATE DATABASE DBA_Monitoring;

GO

```


---


1.2 Table for Alerts


```sql 


USE DBA_Monitoring;


CREATE TABLE Alerts (

    AlertID INT IDENTITY(1,1),

    AlertType NVARCHAR(100),

    Message NVARCHAR(MAX),

    Severity INT,

    CreatedAt DATETIME DEFAULT GETDATE()

);

```




PART 2: CRITICAL MONITORING QUERIES




2.1 Check Database Status


```sql 


INSERT INTO DBA_Monitoring.dbo.Alerts (AlertType, Message, Severity)

SELECT 

    'Database Offline',

    name + ' is not ONLINE',

    1

FROM sys.databases

WHERE state_desc <> 'ONLINE';

```




2.2 Check Failed Backups


```sql 


INSERT INTO DBA_Monitoring.dbo.Alerts (AlertType, Message, Severity)

SELECT 

    'Backup Failure',

    'No backup in last 24 hours',

    1

WHERE NOT EXISTS (

    SELECT 1 FROM msdb.dbo.backupset

    WHERE backup_finish_date > DATEADD(HOUR, -24, GETDATE())

);

```




2.3 Check Disk Space


```sql 


EXEC xp_fixeddrives;

```


Alert if free space < 20%




2.4 Check Blocking


```sql 


SELECT blocking_session_id, session_id

FROM sys.dm_exec_requests

WHERE blocking_session_id <> 0;

```




2.5 Check Long Running Queries


```sql 


SELECT TOP 10

    total_elapsed_time / execution_count AS AvgTime,

    execution_count

FROM sys.dm_exec_query_stats

ORDER BY AvgTime DESC;

```




2.6 Deadlock Detection


```sql 


DBCC TRACEON (1222, -1);

```




PART 3: SQL AGENT ALERT JOB




3.1 Create Monitoring Job


```sql 


USE msdb;


EXEC sp_add_job @job_name = 'DBA_Monitoring_Job';

```


3.2 Add Step


```sql 


EXEC sp_add_jobstep

@job_name = 'DBA_Monitoring_Job',

@step_name = 'Run Checks',

@subsystem = 'TSQL',

@command = '

EXEC check_script for Database Status;

EXEC check_script for Backup Failure;

';

```


3.3 Schedule Every 5 Minutes


```sql 


EXEC sp_add_schedule

@schedule_name = 'Every5Min',

@freq_type = 4,

@freq_subday_type = 4,

@freq_subday_interval = 5;

```




PART 4: EMAIL ALERTING SYSTEM


4.1 Configure Database Mail


```sql 


EXEC sp_configure 'Database Mail XPs', 1;

RECONFIGURE;

```


4.2 Send Alert Email


```sql 


EXEC msdb.dbo.sp_send_dbmail

@profile_name = 'DBA_Profile',

@recipients = 'dba@company.com',

@subject = 'SQL Alert',

@body = 'Critical issue detected';

```


4.3 Auto Email on Alert


```sql 


IF EXISTS (SELECT 1 FROM DBA_Monitoring.dbo.Alerts WHERE Severity = 1)

BEGIN

    EXEC msdb.dbo.sp_send_dbmail

    @profile_name = 'DBA_Profile',

    @recipients = 'dba@company.com',

    @subject = 'CRITICAL SQL ALERT',

    @body = 'Check DBA_Monitoring database immediately';

END

```




PART 5: AWS CLOUDWATCH MONITORING


5.1 Key Metrics to Enable


Monitor in AWS:


* CPUUtilization

* DiskReadOps / WriteOps

* NetworkIn / Out


5.2 CloudWatch Alarm Example


Trigger alert if:


* CPU > 80% for 5 minutes


5.3 Disk Alert


Trigger if:


* Free space < 15%




PART 6: POWERSHELL ALERT AUTOMATION


6.1 Check Disk Space Script


```powershell 


$drives = Get-PSDrive -PSProvider FileSystem


foreach ($drive in $drives) {

    if ($drive.Free -lt 20GB) {

        Write-Host "Low disk space on $($drive.Name)"

    }

}

```


6.2 Send Alert


```powershell 


Send-MailMessage -To "dba@company.com" `

-From "alert@company.com" `

-Subject "Disk Alert" `

-Body "Low disk space detected" `

-SmtpServer "smtp.server.com"

```




PART 7: ALWAYS ON MONITORING


7.1 Check Replica Health


```sql 


SELECT 

    replica_server_name,

    synchronization_state_desc

FROM sys.dm_hadr_database_replica_states;

```


Alert if: NOT SYNCHRONIZED




PART 8: ALERT SEVERITY MODEL


Severity Levels


* 1 (Critical) → Immediate action (DB down, corruption)


* 2 (Warning) → Investigate (slow queries)


* 3 (Info) → Logging only




ART 9: RESPONSE AUTOMATION


Example: Restart Failed Job


```powershell 


Invoke-Sqlcmd -Query "EXEC msdb.dbo.sp_start_job 'JobName'"

```


Example: Kill Blocking Session


```sql 


KILL <session_id>;

```




PART 10: DAILY DBA DASHBOARD QUERY



```sql 


SELECT TOP 50 *

FROM DBA_Monitoring.dbo.Alerts

ORDER BY CreatedAt DESC;

```


PRODUCTION CHECKLIST


You are production-ready if:



SQL Level


* Monitoring job running


* Alerts being logged


* Emails working


AWS Level


* CloudWatch alarms active

* Disk + CPU monitored



Response Level


* Scripts tested


* Alerts actionable


---


Most outages happen because there is No alert or Alert ignored or Alert unclear.


No comments:

Post a Comment

BUILDING MONITORING & ALERTING SYSTEM After Migrating From On-prem SQL Server To SQL Server on AWS EC2

 BUILDING PRODUCTION MONITORING & ALERTING SYSTEM After Migrating From On-prem SQL Server To SQL Server on AWS EC2 This guides to build ...