Foundation Phase: Migrating from On-Prem Oracle Database to MongoDB on Azure VM

This guide is written for DBA support, who must be successfully migrate mission-critical, very large databases (10TB–100TB) from Oracle (on-premises) to MongoDB running on Azure Virtual Machines.

We will use simple English, real-world steps, and free tools wherever possible.

1. Understanding the Big Picture (Very Important First Step)

Before doing anything technical, you must understand one truth:

Oracle and MongoDB are completely different database systems

Oracle Database = Relational Database (tables, rows, columns, joins)
MongoDB = NoSQL Document Database (JSON-like documents, flexible schema)

This means:

You are not just migrating data
You are also transforming how the data is structured

This is called:

Schema Transformation (Relational → Document Model)

2. Migration Strategy Overview (Simple Roadmap)

Think of the migration in 3 major phases:

Phase 1: BEFORE Migration

Assessment
Planning
Environment setup
Schema design

Phase 2: DURING Migration

Data extraction
Transformation
Data loading
Validation

Phase 3: AFTER Migration

Performance tuning
Backup & disaster recovery
Monitoring
Optimization

3. Step-by-Step BEFORE Migration Tasks

Step 1: Assess Your Oracle Database (Discovery Phase)

This is the most critical step. If you skip this, the migration will fail.

3.1 Identify What You Have

Run this in Oracle:

-- Database size
SELECT SUM(bytes)/1024/1024/1024 AS size_gb FROM dba_data_files;

-- List schemas
SELECT username FROM dba_users;

-- Table sizes
SELECT owner, segment_name, SUM(bytes)/1024/1024/1024 AS size_gb
FROM dba_segments
WHERE segment_type='TABLE'
GROUP BY owner, segment_name
ORDER BY size_gb DESC;

This helps you:

Identify largest tables (priority for migration)
Understand data distribution

3.2 Identify Complex Objects

MongoDB does NOT support:

Stored procedures
Triggers
Complex joins

Run:

-- Stored procedures
SELECT object_name FROM dba_objects WHERE object_type='PROCEDURE';

-- Triggers
SELECT trigger_name FROM dba_triggers;

You must plan to move this logic into:

Application layer (Java, Python, etc.)
Or MongoDB aggregation pipelines

3.3 Identify Data Relationships

Relational DB uses:

Foreign keys
Joins

Run:

SELECT a.table_name, a.constraint_name, c.column_name, c.position
FROM dba_constraints a
JOIN dba_cons_columns c
ON a.constraint_name = c.constraint_name
WHERE a.constraint_type = 'R';

These relationships must be converted into:

Embedded documents OR
References in MongoDB

Step 2: Choose the Right MongoDB Data Model

This is where many fail.

Relational DB Example (Oracle)

Customers table
Orders table

MongoDB Design Options

Option 1: Embedded (Best for performance)

{
  "customer_id": 1,
  "name": "John",
  "orders": [
    { "order_id": 101, "amount": 200 },
    { "order_id": 102, "amount": 300 }
  ]
}

Option 2: Referenced (for very large data)

Customer:
{
  "customer_id": 1,
  "name": "John"
}

Orders:
{
  "order_id": 101,
  "customer_id": 1,
  "amount": 200
}

Rule of thumb:

Use embedded for small, frequently accessed data
Use reference for large or independent data

Step 3: Set Up Azure Virtual Machines

We will use:

Azure Virtual Machine (IaaS)
Install MongoDB manually (free)

3.1 Create Azure VM

Recommended:

OS: Ubuntu 22.04
VM Size:
- For 10TB: at least 16 vCPU, 64GB RAM
- For 100TB: 32+ vCPU, 128GB+ RAM

Use Premium SSD or Ultra Disk

3.2 Configure Storage (Critical for Performance)

MongoDB needs fast disk.

Mount disk:

sudo mkfs.ext4 /dev/sdc
sudo mkdir /data
sudo mount /dev/sdc /data

3.3 Install MongoDB (Free Community Edition)

sudo apt-get update
sudo apt-get install -y mongodb

Start service:

sudo systemctl start mongodb
sudo systemctl enable mongodb

Step 4: Set Up MongoDB Replica Set (HADR Basics)

For mission-critical systems, you MUST use High Availability.

Minimum setup:

3 nodes (Primary + 2 Secondary)

4.1 Enable Replication

Edit config:

sudo nano /etc/mongod.conf

Add:

replication:
  replSetName: "rs0"

Restart:

sudo systemctl restart mongodb

4.2 Initialize Replica Set

Connect:

mongo

Run:

rs.initiate()

Add nodes:

rs.add("node2:27017")
rs.add("node3:27017")

Now you have:

Automatic failover
High availability (HADR)

Step 5: Choose Free Migration Tools (Very Important)

Here are the most commonly used free tools:

5.1 Oracle Data Export Tools

Oracle Data Pump (expdp / impdp)
SQL*Plus
Oracle SQL Developer

5.2 MongoDB Migration Tools (Free)

mongoimport
mongodump / mongorestore
MongoDB Compass (GUI)
MongoDB Relational Migrator (Free tool)

Best combination:

Oracle → CSV (using SQL Developer)
CSV → MongoDB (using mongoimport)

Step 6: Extract Data from Oracle

6.1 Export to CSV

Example:

SET COLSEP ','
SET HEADING OFF
SET FEEDBACK OFF

SPOOL customers.csv

SELECT * FROM customers;

SPOOL OFF;

For large tables (10TB+), DO NOT export all at once.

Use batching:

SELECT * FROM customers WHERE rownum <= 1000000;

Step 7: Prepare Data for MongoDB (Transformation Phase)

This step is often ignored but critical.

7.1 Convert to JSON Format

Example transformation:

From CSV:

1,John,USA

To JSON:

{ "id": 1, "name": "John", "country": "USA" }

7.2 Use Python for Automation

import csv
import json

with open('customers.csv') as f:
    reader = csv.DictReader(f)
    data = list(reader)

with open('customers.json', 'w') as f:
    json.dump(data, f)

Step 8: Plan for Large Data Migration (10TB–100TB)

This is where real-world complexity begins.

8.1 Use Parallel Migration

Split data:

By ID ranges
By date
By partitions

8.2 Example Strategy

Node 1 → Customers
Node 2 → Orders
Node 3 → Transactions

8.3 Use Compression

gzip customers.json

Monday, May 11, 2026