Tuesday, March 24, 2026

Developing the Right Questions for Database Design in SQL Server-Classified Version

 

Developing the Right Questions for Database Design in SQL Server


Database design is the silent architect of an application's success. As a SQL Server Database Administrator (DBA), your role is not just to maintain the engine but to ensure the foundation is built to withstand the pressures of high-traffic, complex logic, and ever-growing data. A common mistake is jumping straight into CREATE TABLE scripts before understanding the "soul" of the business process.

The following guide outlines the 15 most critical questions a DBA must ask business stakeholders and application developers. Structured by criticality, each question addresses the What, Why, How, and When to ensure your SQL Server environment remains performant, secure, and scalable.


Phase 1: High-Level Architecture and Data Flow

1. What is the Primary Workload Type (OLTP vs. OLAP)?

  • What: Identifying if the database is for transaction processing (many small writes/reads) or analytical processing (few large reads).

  • Why: SQL Server configuration, indexing strategies, and hardware requirements differ vastly between the two.

  • How: Observe if the app records individual sales (OLTP) or generates quarterly reports (OLAP).

  • When to use: Use OLTP for real-time apps; OLAP for data warehousing. Don't use OLTP design (high normalization) for reporting as it kills join performance.

2. What are the Recovery Time (RTO) and Recovery Point (RPO) Objectives?

  • What: RTO is how long you can be down; RPO is how much data you can afford to lose (in minutes/hours).

  • Why: This dictates your Backup Strategy and High Availability (HA) features like Always On Availability Groups.

  • How: Ask business: "If we crash, how many minutes of work can you lose?"

  • When to use: Use Full Recovery Model for low RPO. Don't use Simple Recovery for mission-critical production data.

3. What are the Anticipated Data Growth Rates for the Next 3 Years?

  • What: An estimate of monthly and yearly record counts.

  • Why: To plan disk capacity, filegroup placement, and partitioning.

  • How: Look at historical data or user growth projections.

  • When to use: Use Partitioning if tables exceed hundreds of millions of rows. Don't use it for small tables, as it adds unnecessary complexity.


Phase 2: Schema Integrity and Relationships

4. What are the Unique Identifiers for Each Entity?

  • What: Determining the Primary Key (PK) for every table.

  • Why: Without a PK, data integrity fails and performance suffers due to lack of a Clustered Index.

  • How: Ask, "What piece of info uniquely identifies one customer/order?"

  • When to use: Use INT/BIGINT IDENTITY or NEWSEQUENTIALID() for performance. Don't use random GUIDs as clustered keys to avoid fragmentation.

5. How are the Nouns (Entities) Related to One Another?

  • What: Defining Cardinality (One-to-One, One-to-Many, Many-to-Many).

  • Why: This determines where Foreign Keys (FKs) live and whether junction tables are needed.

  • How: Use an Entity-Relationship (ER) diagram to map links.

  • When to use: Use Foreign Keys to enforce referential integrity. Don't use "soft links" (logic in code) if you want to prevent orphaned records.

6. What Business Rules Must Be Enforced at the Data Level?

  • What: Identifying constraints like "Quantity cannot be negative" or "Start Date must be before End Date."

  • Why: Business logic in the app can be bypassed; constraints in the DB are final.

  • How: Implement CHECK constraints or DEFAULT values.

  • When to use: Use Check Constraints for simple logic. Don't use Triggers for logic that a simple constraint can handle—triggers are "silent" and harder to debug.


Phase 3: Performance and Query Patterns

7. What are the Top 5 Most Frequent Query Patterns?

  • What: Identifying the "bread and butter" searches of the application.

  • Why: To design the most effective Non-Clustered Indexes.

  • How: Ask developers for the WHERE and JOIN clauses they use most.

  • When to use: Use Covering Indexes (using INCLUDE) for these top queries. Don't use indexes on every column—it slows down INSERT and UPDATE operations.

8. How Much "Old" Data Must Stay Online and Queryable?

  • What: A data retention and archiving policy.

  • Why: Keeping 10 years of logs in the same table as active orders slows down everything.

  • How: Determine a "cutoff" point for moving data to history tables.

  • When to use: Use Temporal Tables or automated archiving jobs. Don't use a single "God table" for both active and archived data.

9. Will the Application Use Ad-hoc SQL or Stored Procedures?

  • What: Deciding how the application talks to SQL Server.

  • Why: Stored Procedures offer better security (permission masking) and plan stability.

  • How: Standardize the API layer between the App and DB.

  • When to use: Use Stored Procedures for complex logic. Don't use dynamic SQL concatenated in the app code, as it opens the door to SQL Injection.


Phase 4: Security and Compliance

10. Which Columns Contain PII or Sensitive Information?

  • What: Identifying data like SSNs, Credit Cards, or Emails.

  • Why: Triggers compliance needs (GDPR, HIPAA) and specific SQL features like Encryption.

  • How: Tag columns during the design phase.

  • When to use: Use Always Encrypted or Dynamic Data Masking. Don't use plain text for sensitive passwords; use salted hashes.

11. Who (or What) Will Access the Database?

  • What: Defining the security model (Service accounts vs. User accounts).

  • Why: To implement the "Principle of Least Privilege."

  • How: Use Windows Authentication and Database Roles.

  • When to use: Use Contained Database Users for easier migrations. Don't use the sa account for application connections.


Phase 5: Operations and Maintenance

12. When is the "Quiet Period" for Maintenance?

  • What: Identifying the low-traffic window for backups, index rebuilds, and updates.

  • Why: Even "online" maintenance carries a performance overhead.

  • How: Check global user time zones and peak business hours.

  • When to use: Schedule DBCC CHECKDB and index maintenance here. Don't use heavy maintenance during peak ETL or business hours.

13. What is the Strategy for Handling Concurrency and Locking?

  • What: Choosing Isolation Levels (e.g., Read Committed vs. Snapshot).

  • Why: To prevent "blocking" where one user stops another from working.

  • How: Use READ_COMMITTED_SNAPSHOT ON for modern web apps.

  • When to use: Use Snapshot Isolation to prevent reads from blocking writes. Don't use NOLOCK hints as a permanent fix—it can lead to dirty reads and skipped data.

14. Are There Requirements for Real-time Data Integration?

  • What: Needing to push data to other systems (CRMs, 3rd party APIs).

  • Why: Determines if you need SSIS, Replication, or Change Data Capture (CDC).

  • How: Ask, "Does the warehouse need this data every 5 minutes or every 24 hours?"

  • When to use: Use CDC for tracking changes without heavy triggers. Don't use complex triggers for data auditing if CDC is available.

15. What are the Deployment and Version Control Plans?

  • What: How schema changes will be moved from Dev to Production.

  • Why: Manual changes in "Prod" lead to configuration drift and outages.

  • How: Use SQL Server Data Tools (SSDT) or migration-based tools (Flyway/Redgate).

  • When to use: Use Dacpacs for consistent deployments. Don't use manual "copy-paste" scripts for production updates.


Conclusion

Building a database without these questions is like sailing without a map. By addressing the What, Why, How, and When of these 15 critical areas, a DBA transforms from a mere "data janitor" into a strategic partner. This structured approach ensures that the SQL Server environment is not just a repository for data, but a high-performance engine that scales with the business.

No comments:

Post a Comment

Native, Open Source and Community Power-Tools for SQL Server Capacity Planning

 Native, Open Source and Community Power-Tools for SQL Server Capacity Planning Managing a SQL Server estate with thousands of instances an...