Making-sense of the Evolution of Data, and Database Technology: BUILDING MONITORING & ALERTING SYSTEM After Migrating From On-prem SQL Server To SQL Server on AWS EC2

BUILDING PRODUCTION MONITORING & ALERTING SYSTEM After Migrating From On-prem SQL Server To SQL Server on AWS EC2

This guides to build a production-ready monitoring system that can:

* Detects problems early

* Alerts the right people

* Helps you diagnose fast

* Prevents outages

BIG PICTURE ARCHITECTURE

You are building 4 layers:

1. SQL Server Internal Monitoring (inside SQL Server)

2. OS + EC2 Monitoring(via AWS)

3. Central Logging & Alerting (SQL Agent + email + logs)

4. Response Automation(PowerShell + alerts)

WHAT YOU MUST MONITOR (CRITICAL SIGNALS)

DATABASE HEALTH:

* Database state (ONLINE/OFFLINE)

* Corruption

* Backup success/failure

PERFORMANCE:

* CPU usage

* Slow queries

* Blocking / deadlocks

STORAGE:

* Disk space

* IOPS latency

AVAILABILITY:

* Always On status

* Connection failures

PART 1: SQL SERVER MONITORING (CORE)

1.1 Create Monitoring Database

```sql

CREATE DATABASE DBA_Monitoring;

```

---

1.2 Table for Alerts

```sql

USE DBA_Monitoring;

CREATE TABLE Alerts (

AlertID INT IDENTITY(1,1),

AlertType NVARCHAR(100),

Message NVARCHAR(MAX),

Severity INT,

CreatedAt DATETIME DEFAULT GETDATE()

);

```

PART 2: CRITICAL MONITORING QUERIES

2.1 Check Database Status

```sql

INSERT INTO DBA_Monitoring.dbo.Alerts (AlertType, Message, Severity)

SELECT

'Database Offline',

name + ' is not ONLINE',

FROM sys.databases

WHERE state_desc <> 'ONLINE';

```

2.2 Check Failed Backups

```sql

INSERT INTO DBA_Monitoring.dbo.Alerts (AlertType, Message, Severity)

SELECT

'Backup Failure',

'No backup in last 24 hours',

WHERE NOT EXISTS (

SELECT 1 FROM msdb.dbo.backupset

WHERE backup_finish_date > DATEADD(HOUR, -24, GETDATE())

);

```

2.3 Check Disk Space

```sql

EXEC xp_fixeddrives;

```

Alert if free space < 20%

2.4 Check Blocking

```sql

SELECT blocking_session_id, session_id

FROM sys.dm_exec_requests

WHERE blocking_session_id <> 0;

```

2.5 Check Long Running Queries

```sql

SELECT TOP 10

total_elapsed_time / execution_count AS AvgTime,

execution_count

FROM sys.dm_exec_query_stats

ORDER BY AvgTime DESC;

```

2.6 Deadlock Detection

```sql

DBCC TRACEON (1222, -1);

```

PART 3: SQL AGENT ALERT JOB

3.1 Create Monitoring Job

```sql

USE msdb;

EXEC sp_add_job @job_name = 'DBA_Monitoring_Job';

```

3.2 Add Step

```sql

EXEC sp_add_jobstep

@job_name = 'DBA_Monitoring_Job',

@step_name = 'Run Checks',

@subsystem = 'TSQL',

@command = '

EXEC check_script for Database Status;

EXEC check_script for Backup Failure;

```

3.3 Schedule Every 5 Minutes

```sql

EXEC sp_add_schedule

@schedule_name = 'Every5Min',

@freq_type = 4,

@freq_subday_type = 4,

@freq_subday_interval = 5;

```

PART 4: EMAIL ALERTING SYSTEM

4.1 Configure Database Mail

```sql

EXEC sp_configure 'Database Mail XPs', 1;

RECONFIGURE;

```

4.2 Send Alert Email

```sql

EXEC msdb.dbo.sp_send_dbmail

@profile_name = 'DBA_Profile',

@recipients = 'dba@company.com',

@subject = 'SQL Alert',

@body = 'Critical issue detected';

```

4.3 Auto Email on Alert

```sql

IF EXISTS (SELECT 1 FROM DBA_Monitoring.dbo.Alerts WHERE Severity = 1)

BEGIN

EXEC msdb.dbo.sp_send_dbmail

@profile_name = 'DBA_Profile',

@recipients = 'dba@company.com',

@subject = 'CRITICAL SQL ALERT',

@body = 'Check DBA_Monitoring database immediately';

END

```

PART 5: AWS CLOUDWATCH MONITORING

5.1 Key Metrics to Enable

Monitor in AWS:

* CPUUtilization

* DiskReadOps / WriteOps

* NetworkIn / Out

5.2 CloudWatch Alarm Example

Trigger alert if:

* CPU > 80% for 5 minutes

5.3 Disk Alert

Trigger if:

* Free space < 15%

PART 6: POWERSHELL ALERT AUTOMATION

6.1 Check Disk Space Script

```powershell

$drives = Get-PSDrive -PSProvider FileSystem

foreach ($drive in $drives) {

if ($drive.Free -lt 20GB) {

Write-Host "Low disk space on $($drive.Name)"

}

```

6.2 Send Alert

```powershell

Send-MailMessage -To "dba@company.com" `

-From "alert@company.com" `

-Subject "Disk Alert" `

-Body "Low disk space detected" `

-SmtpServer "smtp.server.com"

```

PART 7: ALWAYS ON MONITORING

7.1 Check Replica Health

```sql

SELECT

replica_server_name,

synchronization_state_desc

FROM sys.dm_hadr_database_replica_states;

```

Alert if: NOT SYNCHRONIZED

PART 8: ALERT SEVERITY MODEL

Severity Levels

* 1 (Critical) → Immediate action (DB down, corruption)

* 2 (Warning) → Investigate (slow queries)

* 3 (Info) → Logging only

ART 9: RESPONSE AUTOMATION

Example: Restart Failed Job

```powershell

Invoke-Sqlcmd -Query "EXEC msdb.dbo.sp_start_job 'JobName'"

```

Example: Kill Blocking Session

```sql

KILL <session_id>;

```

PART 10: DAILY DBA DASHBOARD QUERY

```sql

SELECT TOP 50 *

FROM DBA_Monitoring.dbo.Alerts

ORDER BY CreatedAt DESC;

```

PRODUCTION CHECKLIST

You are production-ready if:

SQL Level

* Monitoring job running

* Alerts being logged

* Emails working

AWS Level

* CloudWatch alarms active

* Disk + CPU monitored

Response Level

* Scripts tested

* Alerts actionable

---

Most outages happen because there is No alert or Alert ignored or Alert unclear.

Making-sense of the Evolution of Data, and Database Technology

Wednesday, May 6, 2026

BUILDING MONITORING & ALERTING SYSTEM After Migrating From On-prem SQL Server To SQL Server on AWS EC2

No comments:

Post a Comment

MINUTE BY MINUITE PRODUCTION RUNBOOK FOR FULLY AUTOMATED MIGRATION FROM SAP ASE TO SQL Server Azure VM

Report Abuse

Labels