Monday, May 11, 2026

Foundation Phase: Migrating from On-Prem Oracle Database to MongoDB on Azure VM

Foundation Phase: Migrating from On-Prem Oracle Database to MongoDB on Azure VM 

This guide is written for DBA support, who must be successfully migrate mission-critical, very large databases (10TB–100TB) from Oracle (on-premises) to MongoDB running on Azure Virtual Machines.

We will use simple English, real-world steps, and free tools wherever possible.


1. Understanding the Big Picture (Very Important First Step)

Before doing anything technical, you must understand one truth:

Oracle and MongoDB are completely different database systems

  • Oracle Database = Relational Database (tables, rows, columns, joins)

  • MongoDB = NoSQL Document Database (JSON-like documents, flexible schema)

This means:

You are not just migrating data
You are also transforming how the data is structured

This is called:

Schema Transformation (Relational → Document Model)


2. Migration Strategy Overview (Simple Roadmap)

Think of the migration in 3 major phases:

Phase 1: BEFORE Migration

  • Assessment
  • Planning
  • Environment setup
  • Schema design

Phase 2: DURING Migration

  • Data extraction
  • Transformation
  • Data loading
  • Validation

Phase 3: AFTER Migration

  • Performance tuning
  • Backup & disaster recovery
  • Monitoring
  • Optimization

3. Step-by-Step BEFORE Migration Tasks


Step 1: Assess Your Oracle Database (Discovery Phase)

This is the most critical step. If you skip this, the migration will fail.

3.1 Identify What You Have

Run this in Oracle:

-- Database size
SELECT SUM(bytes)/1024/1024/1024 AS size_gb FROM dba_data_files;

-- List schemas
SELECT username FROM dba_users;

-- Table sizes
SELECT owner, segment_name, SUM(bytes)/1024/1024/1024 AS size_gb
FROM dba_segments
WHERE segment_type='TABLE'
GROUP BY owner, segment_name
ORDER BY size_gb DESC;

This helps you:

  • Identify largest tables (priority for migration)

  • Understand data distribution


3.2 Identify Complex Objects

MongoDB does NOT support:

  • Stored procedures

  • Triggers

  • Complex joins

Run:

-- Stored procedures
SELECT object_name FROM dba_objects WHERE object_type='PROCEDURE';

-- Triggers
SELECT trigger_name FROM dba_triggers;

You must plan to move this logic into:

  • Application layer (Java, Python, etc.)

  • Or MongoDB aggregation pipelines


3.3 Identify Data Relationships

Relational DB uses:

  • Foreign keys

  • Joins

Run:

SELECT a.table_name, a.constraint_name, c.column_name, c.position
FROM dba_constraints a
JOIN dba_cons_columns c
ON a.constraint_name = c.constraint_name
WHERE a.constraint_type = 'R';

These relationships must be converted into:

  • Embedded documents OR

  • References in MongoDB


Step 2: Choose the Right MongoDB Data Model

This is where many fail.

Relational DB Example (Oracle)

  • Customers table

  • Orders table

MongoDB Design Options

Option 1: Embedded (Best for performance)

{
  "customer_id": 1,
  "name": "John",
  "orders": [
    { "order_id": 101, "amount": 200 },
    { "order_id": 102, "amount": 300 }
  ]
}

Option 2: Referenced (for very large data)

Customer:
{
  "customer_id": 1,
  "name": "John"
}

Orders:
{
  "order_id": 101,
  "customer_id": 1,
  "amount": 200
}

 Rule of thumb:

  • Use embedded for small, frequently accessed data

  • Use reference for large or independent data


Step 3: Set Up Azure Virtual Machines 

We will use:

  • Azure Virtual Machine (IaaS)

  • Install MongoDB manually (free)


3.1 Create Azure VM

Recommended:

  • OS: Ubuntu 22.04

  • VM Size:

    • For 10TB: at least 16 vCPU, 64GB RAM

    • For 100TB: 32+ vCPU, 128GB+ RAM

Use Premium SSD or Ultra Disk


3.2 Configure Storage (Critical for Performance)

MongoDB needs fast disk.

Mount disk:

sudo mkfs.ext4 /dev/sdc
sudo mkdir /data
sudo mount /dev/sdc /data

3.3 Install MongoDB (Free Community Edition)

sudo apt-get update
sudo apt-get install -y mongodb

Start service:

sudo systemctl start mongodb
sudo systemctl enable mongodb

Step 4: Set Up MongoDB Replica Set (HADR Basics)

For mission-critical systems, you MUST use High Availability.

Minimum setup:

  • 3 nodes (Primary + 2 Secondary)


4.1 Enable Replication

Edit config:

sudo nano /etc/mongod.conf

Add:

replication:
  replSetName: "rs0"

Restart:

sudo systemctl restart mongodb

4.2 Initialize Replica Set

Connect:

mongo

Run:

rs.initiate()

Add nodes:

rs.add("node2:27017")
rs.add("node3:27017")

Now you have:

  • Automatic failover

  • High availability (HADR)


Step 5: Choose Free Migration Tools (Very Important)

Here are the most commonly used free tools:


5.1 Oracle Data Export Tools

  • Oracle Data Pump (expdp / impdp)

  • SQL*Plus

  • Oracle SQL Developer


5.2 MongoDB Migration Tools (Free)

  • mongoimport

  • mongodump / mongorestore

  • MongoDB Compass (GUI)

  • MongoDB Relational Migrator (Free tool)


Best combination:

  1. Oracle → CSV (using SQL Developer)

  2. CSV → MongoDB (using mongoimport)


Step 6: Extract Data from Oracle


6.1 Export to CSV

Example:

SET COLSEP ','
SET HEADING OFF
SET FEEDBACK OFF

SPOOL customers.csv

SELECT * FROM customers;

SPOOL OFF;

For large tables (10TB+), DO NOT export all at once.

Use batching:

SELECT * FROM customers WHERE rownum <= 1000000;

Step 7: Prepare Data for MongoDB (Transformation Phase)

This step is often ignored but critical.


7.1 Convert to JSON Format

Example transformation:

From CSV:

1,John,USA

To JSON:

{ "id": 1, "name": "John", "country": "USA" }

7.2 Use Python for Automation

import csv
import json

with open('customers.csv') as f:
    reader = csv.DictReader(f)
    data = list(reader)

with open('customers.json', 'w') as f:
    json.dump(data, f)

Step 8: Plan for Large Data Migration (10TB–100TB)

This is where real-world complexity begins.


8.1 Use Parallel Migration

Split data:

  • By ID ranges

  • By date

  • By partitions


8.2 Example Strategy

  • Node 1 → Customers

  • Node 2 → Orders

  • Node 3 → Transactions


8.3 Use Compression

gzip customers.json

Step 9: Network Planning (Often Ignored but Critical)

For large migrations:

  • Use Azure VPN or ExpressRoute

  • Avoid public internet


Step 10: Risk Planning 

Common failures:

Network interruption

Disk full

Data corruption

Timeout errors


Mitigation Strategy

  • Always migrate in small batches

  • Always validate data after each batch

  • Always keep Oracle as fallback

No comments:

Post a Comment

MINUTE BY MINUITE PRODUCTION RUNBOOK FOR FULLY AUTOMATED MIGRATION FROM SAP ASE TO SQL Server Azure VM

MINUTE BY MINUITE PRODUCTION RUNBOOK FOR  FULLY AUTOMATED MIGRATION FROM SAP ASE TO SQL Server Azure VM --- OVERALL STRUCTURE Breaking execu...