Monday, May 11, 2026

Foundation Phase: Migrating from On-Prem Oracle Database to MongoDB on Azure VM

Foundation Phase: Migrating from On-Prem Oracle Database to MongoDB on Azure VM 

This guide is written for DBA support, who must be successfully migrate mission-critical, very large databases (10TB–100TB) from Oracle (on-premises) to MongoDB running on Azure Virtual Machines.

We will use simple English, real-world steps, and free tools wherever possible.


1. Understanding the Big Picture (Very Important First Step)

Before doing anything technical, you must understand one truth:

Oracle and MongoDB are completely different database systems

  • Oracle Database = Relational Database (tables, rows, columns, joins)

  • MongoDB = NoSQL Document Database (JSON-like documents, flexible schema)

This means:

You are not just migrating data
You are also transforming how the data is structured

This is called:

Schema Transformation (Relational → Document Model)


2. Migration Strategy Overview (Simple Roadmap)

Think of the migration in 3 major phases:

Phase 1: BEFORE Migration

  • Assessment
  • Planning
  • Environment setup
  • Schema design

Phase 2: DURING Migration

  • Data extraction
  • Transformation
  • Data loading
  • Validation

Phase 3: AFTER Migration

  • Performance tuning
  • Backup & disaster recovery
  • Monitoring
  • Optimization

3. Step-by-Step BEFORE Migration Tasks


Step 1: Assess Your Oracle Database (Discovery Phase)

This is the most critical step. If you skip this, the migration will fail.

3.1 Identify What You Have

Run this in Oracle:

-- Database size
SELECT SUM(bytes)/1024/1024/1024 AS size_gb FROM dba_data_files;

-- List schemas
SELECT username FROM dba_users;

-- Table sizes
SELECT owner, segment_name, SUM(bytes)/1024/1024/1024 AS size_gb
FROM dba_segments
WHERE segment_type='TABLE'
GROUP BY owner, segment_name
ORDER BY size_gb DESC;

This helps you:

  • Identify largest tables (priority for migration)

  • Understand data distribution


3.2 Identify Complex Objects

MongoDB does NOT support:

  • Stored procedures

  • Triggers

  • Complex joins

Run:

-- Stored procedures
SELECT object_name FROM dba_objects WHERE object_type='PROCEDURE';

-- Triggers
SELECT trigger_name FROM dba_triggers;

You must plan to move this logic into:

  • Application layer (Java, Python, etc.)

  • Or MongoDB aggregation pipelines


3.3 Identify Data Relationships

Relational DB uses:

  • Foreign keys

  • Joins

Run:

SELECT a.table_name, a.constraint_name, c.column_name, c.position
FROM dba_constraints a
JOIN dba_cons_columns c
ON a.constraint_name = c.constraint_name
WHERE a.constraint_type = 'R';

These relationships must be converted into:

  • Embedded documents OR

  • References in MongoDB


Step 2: Choose the Right MongoDB Data Model

This is where many fail.

Relational DB Example (Oracle)

  • Customers table

  • Orders table

MongoDB Design Options

Option 1: Embedded (Best for performance)

{
  "customer_id": 1,
  "name": "John",
  "orders": [
    { "order_id": 101, "amount": 200 },
    { "order_id": 102, "amount": 300 }
  ]
}

Option 2: Referenced (for very large data)

Customer:
{
  "customer_id": 1,
  "name": "John"
}

Orders:
{
  "order_id": 101,
  "customer_id": 1,
  "amount": 200
}

 Rule of thumb:

  • Use embedded for small, frequently accessed data

  • Use reference for large or independent data


Step 3: Set Up Azure Virtual Machines 

We will use:

  • Azure Virtual Machine (IaaS)

  • Install MongoDB manually (free)


3.1 Create Azure VM

Recommended:

  • OS: Ubuntu 22.04

  • VM Size:

    • For 10TB: at least 16 vCPU, 64GB RAM

    • For 100TB: 32+ vCPU, 128GB+ RAM

Use Premium SSD or Ultra Disk


3.2 Configure Storage (Critical for Performance)

MongoDB needs fast disk.

Mount disk:

sudo mkfs.ext4 /dev/sdc
sudo mkdir /data
sudo mount /dev/sdc /data

3.3 Install MongoDB (Free Community Edition)

sudo apt-get update
sudo apt-get install -y mongodb

Start service:

sudo systemctl start mongodb
sudo systemctl enable mongodb

Step 4: Set Up MongoDB Replica Set (HADR Basics)

For mission-critical systems, you MUST use High Availability.

Minimum setup:

  • 3 nodes (Primary + 2 Secondary)


4.1 Enable Replication

Edit config:

sudo nano /etc/mongod.conf

Add:

replication:
  replSetName: "rs0"

Restart:

sudo systemctl restart mongodb

4.2 Initialize Replica Set

Connect:

mongo

Run:

rs.initiate()

Add nodes:

rs.add("node2:27017")
rs.add("node3:27017")

Now you have:

  • Automatic failover

  • High availability (HADR)


Step 5: Choose Free Migration Tools (Very Important)

Here are the most commonly used free tools:


5.1 Oracle Data Export Tools

  • Oracle Data Pump (expdp / impdp)

  • SQL*Plus

  • Oracle SQL Developer


5.2 MongoDB Migration Tools (Free)

  • mongoimport

  • mongodump / mongorestore

  • MongoDB Compass (GUI)

  • MongoDB Relational Migrator (Free tool)


Best combination:

  1. Oracle → CSV (using SQL Developer)

  2. CSV → MongoDB (using mongoimport)


Step 6: Extract Data from Oracle


6.1 Export to CSV

Example:

SET COLSEP ','
SET HEADING OFF
SET FEEDBACK OFF

SPOOL customers.csv

SELECT * FROM customers;

SPOOL OFF;

For large tables (10TB+), DO NOT export all at once.

Use batching:

SELECT * FROM customers WHERE rownum <= 1000000;

Step 7: Prepare Data for MongoDB (Transformation Phase)

This step is often ignored but critical.


7.1 Convert to JSON Format

Example transformation:

From CSV:

1,John,USA

To JSON:

{ "id": 1, "name": "John", "country": "USA" }

7.2 Use Python for Automation

import csv
import json

with open('customers.csv') as f:
    reader = csv.DictReader(f)
    data = list(reader)

with open('customers.json', 'w') as f:
    json.dump(data, f)

Step 8: Plan for Large Data Migration (10TB–100TB)

This is where real-world complexity begins.


8.1 Use Parallel Migration

Split data:

  • By ID ranges

  • By date

  • By partitions


8.2 Example Strategy

  • Node 1 → Customers

  • Node 2 → Orders

  • Node 3 → Transactions


8.3 Use Compression

gzip customers.json

Step 9: Network Planning (Often Ignored but Critical)

For large migrations:

  • Use Azure VPN or ExpressRoute

  • Avoid public internet


Step 10: Risk Planning 

Common failures:

Network interruption

Disk full

Data corruption

Timeout errors


Mitigation Strategy

  • Always migrate in small batches

  • Always validate data after each batch

  • Always keep Oracle as fallback

No comments:

Post a Comment

The Most Common Challenges Pre-Migration of Oracle Databases to MongoDB on Azure VM

The Most Common Challenges Pre-Migration of Oracle Databases to MongoDB on Azure VM   Migrating from a monolithic, relational giant like Ora...